Karen Sollin's notes
WAIS and other large documents services - BOF
Steve Hardcastle-Kille, chair
IETF San Diego, evening, March 18, 1992
Purpose: to discuss information services that seem to
becoming popular enough to become "standards."
Consider: WWW, WAIS, DS (X.500)
Relationships between: documents, objects, and directory
UDI: Need, Form, X.500
Need for whom (see Steve H-K slide)
John Curran (BBN) WAIS: an implementation of Z39.50.
Architecture from users point of view:
-Servers: source for a collection of documents, indexed
in some way.
-User: can send queries to servers. All documents in
in a server indexed by all words in each document. Returns
bibliographic and other info. including a handle for
retrieving. Provides searching and retrieval all using
- Server can serve more than one source.. Servers use
native file system for documents. Don't need to duplicate
- All "things" are considered documents, regardless of
format or content
- Can query a server to find out which sources it
provides. TMC also has a source of sources. Source
descriptions might be better off somewhere else, such as
Differences between Z39.50 and WAIS: Z39.50 is very general,
like about form of data, indices, specific form of queries.
WAIS essentially uses Z39.50 as a transport. Brewster would
actually say that WAIS is the protocol - extensions to
Z39.50 - want to merge them. There are 2 indexing models -
public and private (need CM to use it).
Has relevance feedback. Can attach a particularly relevant
to a future query, using all words in document as part of
Can add new routines to index on new types of objects.
Currently view everything as text documents.
Wengyik Yeong (PSI): Representing new kinds of objects in X.500
Have presently added RFCs (documents), have 2 document
series (RFCs and FYIs).
Now want to move on to archives (OSI-DS 22 - describes
archives in X.500). Model is that each archive is a file.
Not always true. Sometimes each source is a separate file.
* Need more sophisticated approach
* Need to custom objects - least common denominator not the
best (eg language, size of binary, machine, etc. - not
things that one will find)
* More documentation info would be helpful.
* Flat organization not very good.
* Need more sophisticated experiments - used only two.
Tim Bernersr-Lee (World Wide Web - Cern)
Hypertext like model: simple uniform interface. All are
subsets of hypertext. The problem is searching in the
hypertext model. Use WAIS or something else for searching -
comes back with a hypertext document.
Architecture: client server.
Client machine which knows lots of protocols for going out
over the network (FTP, Prospero, home-brew,(HTTP) etc.)
Addressing scheme: this is a reference.
Also need common formats.
Gateways to other worlds such as WAIS, VMS help files. To
other kinds of servers.
Runs on TCP, send query, get response.
Wnat to extend to sending authentication, perhaps profile of
client so can know what the client can display.
HTML: mockup language for sending back hypertext, also very
User interfaces: for non mouse users tag things with numbers
that they can type.
Have problem of multiple indices.
To fast run through. More support for interfaces than for
setting up servers.
How does it fit into everything else?
X.500: need to be able to refer to anything - needs
universal document identifiers (currently use address, but
wrong - might move) Could use DNS,, but no further work on
Cover current situation
structure: 3-parts: eg. protocol, host, port
Could get to information (objects as above) from X.500.
WAIS vs. WWW vs Gopher
WWW data model: document, text, or hypertext, open
addressing (can always add more components)
Gopher: file or menu, open addressing, very simple server,
large deployment, indexes
WAIS: relevance feedback restricted to a single server,
source file contains organization, indexing, each source is
a closed world.
Gopher, WWW, Prospero, pointers can go back and forth and
all over the place.
Question or comment: concern about being to jump or charge -
people might like to peer over the edge before jumping,
either because may be hard to get back and to understand
cost of jumping.
Code is available to "collaborators" - anyone who uses it or
SLAC, Fermi Lab, etc, really for high energy physicists.
Steve Hardcastle-Kille (Directory issues) OSI-DS 25
Directories in the real world
Global naming: benefits
* express relationships in names
* Listing services in the directory. In the broadest sense
bringing things together.. Might use for yellow pages,
multiple provides for similar things. Might use it for
localizing activity. Listings in one place might lead to
listing in others.
* Browsing through X.500 to an external listing service,
such as WWW or WAIS.
* Hierarchy - rigid, but can overlay multiple hierarchies.
* Pointers - alias (forward pointer across the hierarhcy)
and "see also"
* Use to model groups as objects with components.
Can parts of the hierarchy (DSA's) really be something else
besides X.500. Might be WWW or WAIS, etc.
Paul Barker (?), UCL project: (just starting up, trying to push the
3 foci (did I miss something here - I have only 2)
* gray literature - unpublished, research documents. Not
systematically available. Store this stuff in the
directory. Question of how to organize, where to hang them
- - off individuals, docs for dept, docs for institution, etc.
Experiment in putting documents in the thing.
* (funded by British Library) Want to take Mark records of
library and model them in X.500. One issue is that LOTS of
attributes. (Issue - there is no one standard for Mark
* Librarians are especially interested in looking for
Question of whether "The Directory" can contain orders of
magnitude more objects and bigger objects that hertofore.
Cliff Neuman (Prospero)
How relates to others (non-X.500)
Goal mechanism for organizing information, follows
filesystem model rather than hypertext is in W3. Causes
multiple queries, therefore have to be fast. Directory
service with references to other directories or files. Does
not deal with retrieval (FTP, Andrew, NFS, currently adding
WAIS, will add HTTP). Prospero views a query as a
directory, and response is a file.
Prospero and X.500: can use X.500 to translate soft names to
things to put into Prospero query. Real problem is a single
global naming scheme. Generally organized by owner,
authority, not necessarily organized by topics. Real
problem is what the topics should be and what should be in
them. Believes in multiple name spaces. People can have
own, but typically will start with either a copy of or a
link to another one.
Need shortcuts, so user doesn't have to construct all the
detail of a namespace. Prospero allows you to glue together
parts of other directories, called filters. There are
canned ones, but users can build their own.
Closure: (namespace, object) this is how to pass names.
Namespaces really have addresses that are global, and not
used by the user. On the other hand each user can have
his/her own name for any particular namespace.
Larry Masinter, Xerox, System 33
* Document handle: uninterpreted, max 32 byte id that every
doc has. Truly only a content identifier. (A substring of
this is used to find the document, but hidden from users.)
* file location: protocol, host, path, offset, format,
* document: a thing that has a handle.
A lot of the work was in conversion of formats.
Also time on access control - per document ACLs. Made them
part of the description.
Multiple protocols was a problem because not all machines
had the same protocols. Done by a gateway.
Normalizing attribute-vallue space would cause there to be
none - LOTS of different kinds of documents. Some are lit,
and library docs, but others might be quotes, job
applications, references, financial reports, etc. Some
properties actually require computation.
Tim back again
W3 document = Prospero directory = menu
All based on an address
W3 has an all inclusive model, but only 2 global namesspaces
(DNS and X.500, but DNS is no longer being extended, so the
only one is X.500).
Peter Deutsch: equivalence. Question of two udi's or
pointers to one document. Also question of exact duplicates
with separate udi's.
Larry Masinter believes it is ok to have a timestamp in it.