Reasons to Consider Distributed Searching
- Reasons for Distributed Indexing
- Single orgs cannot maintain high quality for everything - specialists
can do better - knowledge is greater closer to its source (but greater
diversity means less standards)
- Indexing already needs to be distributed for global coverage and to
reduce redundant network and server load - crawlers are searching just-in-case.
- Enables freemarket of multiple classifications (ontologies) (but less
standards means more chaos)
- Successful central servers are swamped until they fail. (but supporters
fund growth to the limit of technology - how far?)
- Replication of central servers is more expensive and wasteful. I.e.
just-in-time sometimes cheaper than just-in-case.
- Reduce server load by distributing searching over several servers.
(but network load and total server overhead is greater?)
- Alternative search services even for same speciality provide incentive
to improve quality. (but also applies to central servers)
- Full content searches feasible on small scale, not on global scale.
- Smaller search service has lower barrier to entry.
Distributed Searching Requirements
- Distributed Indexing (consider distributed searching of central index)
- Organized Semantic Structure - otherwise we're flying blind
- Need either Single Standard for Query Language and Result Data or mappings
between few standards - difficult either way
- How to do query refinement and relevance feedback with distributed
Alternative Architectures for Distributed Searching
- Centralized vs Distributed across many processes
- Immediate vs Distributed (delayed) across time (e.g. agents)
- Collection issues:
- Search server vs collection
- Localized vs Distributed collections
- Provider-specified vs User-specified collections
- Small (<100) vs Large (>100K) collections
- Flat vs Structured collections
- Static vs Dynamic
- Search index (preprocessed metadata) vs content (raw data)
- Flat list of search engines vs 2-levels (e.g. mediators) vs General
Hierarchy vs Lattice vs Web (with cycles)
- Single vs Multiple indexes per domain area
- Centralized vs Distributed (delegated) control of many processes
- Communication issues:
- Synchronous vs Asynchronous communication between processes
- Connection vs Connectionless
- Client-Server vs Register-Notify
- Stateful vs Stateless
- Resident Search Software vs Uploaded Search Applets.
- Single Standard Queries vs Mapping between query schemes vs no standards
- Single Standard Results (metadata, ranking) vs Mapping between result
few schemes vs no standards
This page is part of the DISW 96 workshop.
Last modified: Thu Jun 20 18:20:11 EST 1996.