Jill Foster on X.500/WAIS BOF
Jill Foster
See minutes , IETF , the rest of
Jill's report
This was really a companion BOF to
the "Living Documents" BOF which
had seen a wide ranging discussion
on networked information retrieval.
In contrast this BOF was more structured
and started with presentations on
the various applications. There
is a need to have some sort of Universal
Document Identifier that could be
used by the various applications.
WAIS
John Curran provided a short description
of WAIS. (Unfortunately no one from
"Thinking Machines" was able to attend
the IETF). However John had a reasonably
good knowledge and experience of
the application (NNSC have a WAIS
interface to the RFCs).
_______________ _______________ _______________
| | | | | |
| | | | | Files of |
| | | WAIS | | Information |
| Client | --->----- | Server | | (e.g. RFCs) |
| | | | | |
| | | | | |
|_______________| |_______________| |_______________|
The WAIS Server has an inverted index
of all the words in a document which
is pre-built. (This does not make
sense for non-text files of course).
It also holds other information
about the document (size etc). A
client will formulate a query on
behalf of the user and send it to
the WAIS Server which will search
the index and retrieve and return
the document using the same protocol
(Z39.50). Use of a pre-built index
makes this very fast.
One WAIS Server may have multiple
sources (and multiple indexes). There
are various WAIS Servers in existence,
but there is currently no way of
querying which Server is responsible
for which source. The possibility
of putting WAIS descriptor files
on a Server or in an X.500 directory
was discussed.
Differences: Z39.50/WAIS:
- WAIS specifies how a query should
be formulated (Z39.50 does not)
- WAIS uses Z39.50 (slightly modified)
as the transport protocol.
- WAIS also provides relevance feedback.
OSI-DS 22:
Wengiyk Yeong presented his draft
RFC on representing a public archive
in the directory. He also described
a project using this. A file can
be found using the directory and
then automatically retrieved using
the specified access method.
World Wide Web:
Tim Berners-Lee gave a talk on the
World Wide Web. This project has
been funded to provide a service
to the world wide community of high
energy physicists. It is a hypertext
system. The philosophy behind it
is that a user should be able to
point and click on an item name or
a word within a document and the
associated document would be retrieved
from wherever in the world and presented
to the user in an appropriate format
- without the user having to be aware
of where the document is located
or what the access method is. These
details are hidden in the hypertext
links. There were server programs
for many information servers, gateways
to WAIS, Archie and gopher and client
programs for various user machines.
The overlap between WWW, WAIS, Archie,
Prospero was indicated and the need
for a UDI by all of these was discussed.
Each application (apart from WAIS)
uses a "handle" for a file which
can be prefixed by something appropriate.
WAIS currently can only have "WAIS"
as the prefix. There is a need for
it to be more flexible.
Mailing lists: WWW-interest@nxoc01.cern.ch
WWW-talk@nxoc01.cern.ch
OSI-DS25:
Steve Kille discussed this paper
"Representing the Real World in an
X.500 Directory".
A Listing Service may be used to
group like information items together
for example to provide a Yellow Pages
Service.
- Could represent members of a special
interest group.
- Group Documents on a particular subject.
Services such as Archie could be
considered to be Listing Services.
One imagines an information Universe
in which Information Brokers provide
different subject based (say) views
via their listing service. One would
then need to locate the various listing
services (using a mechanism such
as a directory?)
OSI-DS mailing list: osi-ds@cs.ucl.ac.uk
Subscriptions: osi-ds-request@cs.ucl.ac.uk
UK British Library Project:
Paul Barker described a project,
sponsored by the BL, to represent
grey literature (unpublished research
papers) in the Directory. The project
is thought to be unlikely to succeed
- but one of the aims is to demonstrate
whether or not it is possible. They
will take the (UK) MARC records and
model these within X.500. They might
also consider trying to provide a
listing service so that the documents
might be retrieved more readily by
subject area.
Prospero:
Cliff Neuman described Prospero.
It follows a file system (rather
than hypertext) model. It is built
on UDP. It has the notion of a Directory
which contains links to other objects
(other directories or files). It
returns the link to the information
object and then automatically retrieves
the file by another mechanism by
the appropriate access method (Archie,
WAIS, nntp, WWW - soon!, NFS, ftp
etc.) It has linked very successfully
with archie. Cliff stated that he
expected to be able to use X.500
to translate between the document
ID and how to get the document.
With Prospero the user has his own
view of the global information base
(or has a view built for him). Cliff
thought there should be multiple
name spaces - but the difficulty
would be that these would need representing
near the top of the directory tree.
With multiple user chosen views
- this would be difficult to manage.
Also two users might refer to an
object by different handles which
would be relative to their individual
name spaces - difficult when passing
references (say in a mail message)
from one person to the other.
Mailing list info-prospero@isi.edu
System 33
Larry Masinter talked about a project
at XeroxParc. There was the concept
of a
- HANDLE
- 32 byte number (is a content
ID)
- FILE Location (6 part)
- Protocol;
Host; Path; piece; format; timeout
- Description
- (normal "Catlogue" information:
name, Author, Document, etc)
There is format negotiation when
a document is retrieved.
Also considered Access Control.
ACL is part of description. The
Server exploits multiple protocols
for Search and retrieve.
There is a problem with dealing with
different types of document:
- applications for jobs
- product specs.
- memos
- contracts
- faxes
- etc.
It is difficult to normalise the
attributes of a general document.
Summing up
Tim Berners-Lee summed up by saying
that all applications described had
a need for a Unique Doc ID and for
a name service for this. The UDI
needed to be resolvable. (This is
not the same as USDN - content ID
- described earlier). There should
be a WG on details of UDI (but this
needs a better name) and a separate
one for USDN (and the need for a
single resolver for these). Chris
Weider agreed to co-author a document
on the issues.
I suggested that it might be useful
to try just doing this. That is
to have a pilot-project to try putting
UDI's in the directory for a set
of files and to have the gopher,
Prospero, archie, and Prospero people
try to utilise these.