Usenet News meets The Web
Introduction
Several months ago my frustration with existing newsreading software got to
the point that I began writing my own. At about that time, I
discovered Mosaic and started using the World Wide Web. I quickly realized
that the Web was the way to go, and started over, this time working on an
nntp-to-html gateway.
As I thought about it, I came to the conclusion that it would be worthwhile to
re-evaluate systems like news and mailing lists in light of newer systems, like
wais and the web. I eagerly awaited the creation of comp.infosystems.www, as
a location to bring up this discussion.
Unfortunately, just as ciw was created, an onslaught of real work interfered,
and I had to put my thoughts on the back burner. But then Mosaic's group
annotation system was activated with the announcement of a experimental
public-access "group" annotation server. The flurry of activity prompted
several people to start musing about the web and news, and I thought I should
organize my thoughts and present them.
Before I begin, I want to assure the folks coming from the usenet side of
things that I am not out to destroy or mutate the news system,
particularly "out from under" the existing users. In the proposals at the end
of this, I have tried to suggest a system which will augment the existing
system with a minimum of interference for those who do not wish to be involved.
But it will allow growth in new directions, and I would hope that eventually
most users would chose to adopt some or all of the additional functionality.
Current Systems
First I would like to briefly discuss some of the existing systems and
projects; what they try to do, and some of their advantages and disadvantages.
Mailing Lists
-
These were about the first "conferencing" systems. With today's server
software (e.g., LISTSERV), they are easy to set up, and can be created quickly.
They are most ideal for small, scattered groups, and low traffic. The mail is
only sent to those who ask for it, it is usually fairly direct and fast, and
for people who use e.g., biff, there is a built-in alert mechanism. Also,
since mail is usually the first thing people get working on a newly-networked
computer, it is the most availalble. Disadvanges include nonscalability: as
the list grows large, the mailing hosts and "exploders" get heavily loaded. As
traffic goes up, the users' disk quotas vanish. Users can't just "stop
reading," they have to signoff, this is sometimes not straightforward
(especially where exploders are used). Some mailing lists have other
channels associated, for instance for the distribution of software. Many are
automatically archived by the server daemon. With the advent of MIME, they
are perhaps the first wide-scale multimedia-capable conferencing method.
Usenet News
-
As mailing lists grew unmanagable, usenet news was created to address the
problems of load and traffic. The flood-fill method of distribution provides
for a fairly fast transmission, but with a distributed load and (in multiply-
connected areas) a resistance to network outages. The built-in expiry
mechanism is essential. There are actually few
overt problems with the current usenet scheme, though there are some areas of
concern. There is some multimedia (e.g.,
alt.binaries.pictures.orchids), but it's not generally standard or
automatic. The "real" usenet heirarchies have a strict procedure for the
creating of a group, which provides a means of control, but the alt
heirarchy provides a place for rapid reaction, as well as more casual
gatherings. (Though it suffers a bit from a lack of removing old
groups: alt.desert-storm lingers on at my site, empty.) There is the
problem of authenticity; forged posts (especially to moderated groups) and
bogus cancel messages are becoming more common. Perhaps the
item of most concern is the increase in traffic: popular unmoderated groups
get so much traffic they are difficult to read; this leads to an increased
demand for moderated groups, but the increase in traffic makes moderation
almost a full-time job (see comp.dcom.telecom or
rec.humor.funny). There are some experiments with more intelligent
news-selecting software, but nothing on a large scale. On the flip side
of moderation, they can't please everybody (see the perennial complaints
about rec.humor.funny, or the ARMM controversy).
Another problem stems from one of the critical advantages: the expiration of
articles. This "lack of memory" leads to the same issues being discussed
over and over again, which in turn leads to FAQs and other periodically
posted items. I can't believe that repeatedly posting the same thing over
and over is an efficient means of distributing information. Usenet is fairly
timely, and for those who wish there are news programs which will feep the
user upon arrival of messages.
Smarter newsreaders
-
The advantage of the "user-agent" approach is that anyone can write their own
newsreader. This has resulted in newsreaders which do more advanced article
selection and grouping. The "Usenet Interface
Project" is working towards schemes for filtering, rating, and
annotating news.
Anonymous FTP and Archie
-
Not really a "conferencing" system, aftp is more suited towards the long-term
storage of references and resources. Archie provides a global searching
facility. The main problems are the high loads on the few archie servers and
on the popular ftp sites.
WAIS
-
The big feature of WAIS is the smart searching capability.
The World Wide Web
-
The youngest of the popular protocols. It started with basically two ideas:
- A common addressing scheme for everything -- anonymous ftp, wais, gopher,
news, etc. (the addresses are URLs, for Uniform Resource Locators).
- Hypertext, using this addressing scheme.
With the creation of the Mosaic web browser (a browser is to the web what a
newsreader is to usenet), the web rapidly moved from hypertext to "hypermedia."
It now supports images, movies, sound, arbitrary binaries, etc. Because by
definition everything already available (by ftp, etc.) is "on the web," the
project was jump-started past critical mass.
Web Annotations
-
Since the very early versions, the Mosaic web browser has had the ability to
create "personal annotations" -- per-user "jottings in the margins." These
are stored in the user's home directory, indexed by URL. At some point, there
was added a latent "group annotation" facility, a way for a workgroup or
organization to share notes. (There is also the suggestion of a "public
annotation," but this has not been implemented yet.) A little while ago,
as an experiment the authors of Mosaic announced a "public" group annotation
server. It was wildly successful. Anything on the web can be annotated,
so comments can be placed directly at the point of interest. Since annotations
can themselves be annotated, this natural followup mechanism lead to a natural
threading of the ensuing discussions. Let me quickly note that the use of the
annotation server is strictly voluntary -- if a user does not specifically
set the point his browser at the server, he will never be bothered by what
he may consider net.graffitti. Since the annotations are hypertext, they
can contain pointers to related resources or discussions. Clearly, this
server mechanism does not scale to full public use, but that is one of the
things I wish to address.
Possibilities
Public annotations
For public annotations, we need a system with a distributed load, an expiry
mechanism, and a fairly robust distribution system. In short, we need NNTP.
So what I would suggest is the trial creation of a newsgroup, say
alt.web.annotations. A public annotation would be made by posting
an article. An additional header, say "X-annotated-URLs:" would list the
document(s) annotated. The expiry system can be used "out of the box" --
presumably, if any comments were good enough to keep, the author of the
original document would put in a static pointer. It is important to also have
in each posting/annotation a pointer to this top document, so sub-annotations
(followups) don't get orphaned when the parents expire. Further details
can be hashed out as we discuss implementation
(e.g., we need a clear way of "signing" a document with an e-mail address (we
need this anyway) so a browser can easily e-mail the author).
Newsgroup URLs
Many newsgroups have a set of periodically posted articles. It would be nice
if each newsgroup had associated with it a URL pointing to whatever resources
were associated with the group -- the FAQ, or the archives, or a "home page"
with links to these other associated and related resources. Notice that each
newsgroup can have a title associated with it: the 'LIST NEWSGROUPS' nntp
command provides the names. I think an elegant, though probably unpopular,
way of doing the association would be to allow this title to be either a URL
or an html-style anchored title.
This home page would then describe the group, with pointers to the charter,
the FAQ, the archives, etc.
I suspect that this might be unpopular, since it is a change visible to the
uninterested. As an alternative, I'd propose a mechanism built-in to the
nntp-html gateway code.
NNTP to HTML gateway
I think we'll need to write a companion gateway, that can sit next to the
nntp server and spit out html. (I suppose we could unite the two, and just
make a replacement server). This nntp-html gateway can present newsgroups
as "home pages" that can be publicly annotated (obviously, where real home
pages exist, they'd be used: thus users can't help but see the FAQ (or at
least a pointer to it) before posting!). The gateway will need to interpret
the followup-structure of "regular" postings to present them in the
annotation followup style. For public annotations of regular web pages, the
annotation is sent out on the special group, and that's it. For public
annotations of newsgroups, we have a couple possibilities. Clearly, we want
something to go back out onto the "real" newsgroup. We can post (or crosspost)
the annotation directly to the newsgroup. Unfortunately, this is liable to
annoy those who don't have web-aware newsreaders and who don't like reading
raw html. Alternatively, we could post the real annotation to the special
annotation group, and a stripped-text version (with a pointer in a header,
say "X-unstripped-pointer:") to the regular group. It wouldn't be as rich,
but we could phony something up with footnotes that wouldn't look too bad.
Receiving gateways would notice the duality, and just use the hypertext
version. This has the advantage that it's non-intrusive, but the disadvantage
that everything would be posted twice. Maybe we'll persuade people to convert
fast enough that the extra traffic won't kill us.
Selection of News
In this public group annotation server experiment, we've seen that because the
system was used as a discussion/newsgroup type of service, we need a way of
seeing what new annotations have been made. Some people have just been
scanning the server's directory of annotations, but this rather dry approach
works only because of the comparatively tiny traffic. We need a new solution,
particularly if we're going to read news via the same interface.
One immediate implication of the news-->annotation shift is that newsgroups
become less structured: in fact, anything on the web is likely to serve as
a basis for a thread of discussion. This means traditional methods of
selecting newsgroups won't fill the entire need. I think we need a "selection
engine" à la WAIS. (I'm not too familliar with WAIS beyond the standard
documentation, so I don't know if we can just take it over as-is, or if we'll
need to work on it.) We feed the selection engine pages of data specifying
what we're interested in. These can be traditional headers, like
newsgroups: alt.orchids, or subject lines, or the URLs of things we've
written or annotated recently (our browser can keep track of this), etc. In
fact, we can feed the selection engine rules that change the 'goodness'
value of posts -- for instance, I'd probably give a big positive delta to
anything with From: henry@zoo.toronto.edu. I'd like to note in
passing that this functionality will address the Usenet Interface Project's
number 1 project. (Number 3 will basically fall out of the annotation style
of doing things, and 2 is covered later.)
What about moderation? Ignoring for the moment the interface to traditional
newsgroups, this selection engine approach also answers the moderation problem.
A moderator can edit a document (pointed to by the home page) which contains
deltas or coefficients for the 'goodness' of postings. By feeding this
document to the selection engine (which a user could have done automatically
by the browser), the moderator's choices are enforced. Traditional yes/no
moderation would involve coefficients of 0 or 1: a 1 on anything approved, and
a 0 'wildcard'. Interested moderators could rank postings, or use fractional
values for borderline postings, etc.
I'd also like to mention that this provides a transparent mechanism for
multiple moderators. I could, for instance, tell my browser to apply
ftp://redpoll.mrfs.oh.us/red/aff.slc when I read
alt.fan.feynman. I would then invoke the selection rules that some
motivated 'editor' creates. Others, whose tastes differ, would never be
affected. Also, people could select to ignore the moderator, and see
basically an unmoderated version of the group.
What about traffic? One benefit of moderated groups is the ease on the reader,
another is the ease on the network. If groups with multiple versions of
moderation develop (including unmoderated), the traffic load would go up. I'd
reply that the server-to-server communication could also use the selection
engine. A site may elect to receive only the articles approved by the
'official' moderator.
I'd also mention that perhaps a smart server could make a note of the
selection rules users passed it, and perhaps involve them in its own
selection criteria.
Now what about traditional newsgroups again? I'd suggest that what we do now
it twofold:
- when posting/annotating to a regular group, the annotation
always goes out on the annotation group. The stripped version that would go
to the 'real' group is, as usual, sent to the moderator.
- when reading a regular group, if there is not a moderator's selection list
indicated on the home page, the server makes one up based on the
Approved-by headers.
This is both backwards and forwards compatable.
Multimedia
The usenet is already sort-of multimedia: there are groups like
alt.binaries.pictures.orchids. But this isn't terribly transparent
or automatic. The web is already multimedia, but it uses pointers to the
images and other non-text items. If I were to annotate/post a picture, the
current web paradigm would keep the image in one place, and send out a pointer
to it. Clearly, this isn't a solution.
For sending multimedia, the first thing that comes to mind is MIME. I don't
see why we can't just use it. There is one problem, which I don't know if
MIME addresses (I haven't seen it mentioned, but I haven't looked hard):
splitting large documents. Supposedly there are news gateways which reject
or truncate articles which are too long, so the picture of a really nice
Pholidota might be sent out in three or four parts. If MIME doesn't address
this directly, I'd suggest the nntp-html gateway split the MIME document,
post each part (noting the message-id's used), and then post a "cover" document
(the "part 0" one) that 1) indicates it's a cover document, and 2) provides
pointers to the pieces. Traditional folks would have much the same style of
multipart things to put together (except they'd have to wash it through MIME),
and web browsers would do the work automatically.
Security
There has already been some brief discussion of a way of cryptographically
signing documents for authentication. I don't know of any system that can
be freely used everywhere, but if we come up with one, it would go a ways
towards 'guaranteeing' a moderation of a newsgroup: the moderator signs his
selection document(s), and each entry has a cryptographic hash of the document
in question to guarantee somebody isn't disguising articles. This needs
discussion.
Multiple Servers, and Scale
One disadvantage of this proposed scheme is that, though the selection system
may make newsreading faster by eliminating uninteresting documents, it is
likely to greatly increase the number of postings to what could be termed
minor groups (since any document could serve as the seed for a thread). On
the other hand, by using the web, responsible people can rather transparently
reduce the load on the net. It has always been possible to post a small
article, saying the full text, or the full code, of some less popular issue
is available on-line, for instance by ftp. But the additional overhead of
acutally running ftp and fetching the document means none but the most
motivated of readers will actually bother. To get an audience, the entire
thing has to be posted. However, in the web, this indirection is merely an
additional click away. A conscientious user could start off posting merely
a pointer to a document in his own webspace. If it proves more popular than
he anticipated, he can always stave off the load by "editing the annotation"
(reposting) to include the document directly.
This scheme, where any page can serve as a focus for discussion and a channel
for distribution, has many of the same advantages as a mailing list, such as
ease of starting and focus. If a subject increases in popularity, the scheme
scales nicely. However, if the increased traffic results in a move away from
carrying all of this 'minor' annotation traffic, we can use a multiple-server
scheme. Currently, reading news from two or more nntp servers is not
transparent; with this web scheme, it would be: a major point of the web is
the transparent access to resources on many nodes. The two extremes are
simple: either a 'group' is available on the local news server, or if not, the
reader could use the server suggested in the group's home page. Intermediate
cases, where a group is available at a non-local but closer site, require some
thought. It would be nice if this finding were all automatic, à la Archie.
If we used a proxy system, where articles are always retrieved from the local
server (but the local server might have to fetch them from somewhere else),
then a smart server could make note of the interest, and if it continues,
could modify its selection criteria to begin a newsfeed. This scheme would
make everything available, but there would be network traffic only when there
was a real or anticipated interest.
Far Future?
If we move in the directions I mentioned, what
will the future news system look like? What I have in mind is a much less
centralized, much less controlled system. Any document on the web can serve
as a seed for discussion. This will make for much more focused groups of
articles, but it will require a smarter selection system to find interesting
ones. This system will allow a finer tune to be made on 'interesting', and
if we use the smart-server approach, the news servers themselves can use
this selection information to help minimize the network load. Moderation in
the traditional sense (of not allowing an article to go out) will be gone, but
will be replaced by a 'rating' or 'editing' scheme.
One problem would be the loss of a specific set of "core groups." The existing
CFD/CFV system serves a purpose, which would be lost here. The main advantages
of this system are 1) limits the number of groups a user has to be aware of,
and reduces clutter and junk, 2) because many sites merely take these core
groups, it helps reduce network traffic, 3) it prevents needless splintering
and duplication. The first and third point would be addressed with the
selection system. However, there would still be a need for one or a few
respected official lists. These could serve as a starting point and, if we
don't come up with a good smart-server approach, they could serve as the basis
for the server-to-server selection criteria.
Conclusions
The existing network news system does a very good job. However, recent
innovations and developments, including multimedia, selection engines, and
the integration of resources into one web, provide ways we can improve the
news system. I have suggested a system that can initially sit alongside the
exiting news system, providing users who switch with all the information in
the old system, and all the advantages of the new. It would offer a way of
being more selective in article selection, it would naturally "thread" items
into their natural discussions, it would be transparently multimedia, and it
would link into the world-wide web of networked resources. New "groups" of
related articles can more easily be spun off, giving better focus, but the
selection mechanism prevents this blizzard from blinding a user. A rating
scheme would supplement the current moderating scheme, which would allow not
only a possible scale of articles, but would cater for differing levels of
interest. As the system develops, "smart" news servers could use their users'
selection criteria to form their own, used when communicating with other
servers. This would limit network traffic to that expected to be used, while
still making everything available to the user.
Discussion
I will post this to several newsgroups involved in this discussion:
news.future, comp.infosystems.www,
comp.infosystems.wais, comp.mail.mime, and
comp.multimedia. Followups will be directed to the first two, as
they are the ones more than casually involved. While it is possible to
annotate this document using NCSA's experimental server, the annotations
wouldn't be seen by most of the network community. Therefore, nice though
annotations may be, I'd have to ask you to post instead. (Particularly because
anyone annotating is likely a web supporter, and I'll need as many supporters
as possible when I let this loose on the Usenet world!)
Frederick.