Usenet News meets The Web

Introduction

Several months ago my frustration with existing newsreading software got to the point that I began writing my own. At about that time, I discovered Mosaic and started using the World Wide Web. I quickly realized that the Web was the way to go, and started over, this time working on an nntp-to-html gateway.

As I thought about it, I came to the conclusion that it would be worthwhile to re-evaluate systems like news and mailing lists in light of newer systems, like wais and the web. I eagerly awaited the creation of comp.infosystems.www, as a location to bring up this discussion.

Unfortunately, just as ciw was created, an onslaught of real work interfered, and I had to put my thoughts on the back burner. But then Mosaic's group annotation system was activated with the announcement of a experimental public-access "group" annotation server. The flurry of activity prompted several people to start musing about the web and news, and I thought I should organize my thoughts and present them.

Before I begin, I want to assure the folks coming from the usenet side of things that I am not out to destroy or mutate the news system, particularly "out from under" the existing users. In the proposals at the end of this, I have tried to suggest a system which will augment the existing system with a minimum of interference for those who do not wish to be involved. But it will allow growth in new directions, and I would hope that eventually most users would chose to adopt some or all of the additional functionality.

Current Systems

First I would like to briefly discuss some of the existing systems and projects; what they try to do, and some of their advantages and disadvantages.

Mailing Lists

Usenet News

Smarter newsreaders

Anonymous FTP and Archie

WAIS

The World Wide Web

Web Annotations

Possibilities

Public annotations

For public annotations, we need a system with a distributed load, an expiry mechanism, and a fairly robust distribution system. In short, we need NNTP. So what I would suggest is the trial creation of a newsgroup, say alt.web.annotations. A public annotation would be made by posting an article. An additional header, say "X-annotated-URLs:" would list the document(s) annotated. The expiry system can be used "out of the box" -- presumably, if any comments were good enough to keep, the author of the original document would put in a static pointer. It is important to also have in each posting/annotation a pointer to this top document, so sub-annotations (followups) don't get orphaned when the parents expire. Further details can be hashed out as we discuss implementation (e.g., we need a clear way of "signing" a document with an e-mail address (we need this anyway) so a browser can easily e-mail the author).

Newsgroup URLs

Many newsgroups have a set of periodically posted articles. It would be nice if each newsgroup had associated with it a URL pointing to whatever resources were associated with the group -- the FAQ, or the archives, or a "home page" with links to these other associated and related resources. Notice that each newsgroup can have a title associated with it: the 'LIST NEWSGROUPS' nntp command provides the names. I think an elegant, though probably unpopular, way of doing the association would be to allow this title to be either a URL or an html-style anchored title. This home page would then describe the group, with pointers to the charter, the FAQ, the archives, etc.

I suspect that this might be unpopular, since it is a change visible to the uninterested. As an alternative, I'd propose a mechanism built-in to the nntp-html gateway code.

NNTP to HTML gateway

I think we'll need to write a companion gateway, that can sit next to the nntp server and spit out html. (I suppose we could unite the two, and just make a replacement server). This nntp-html gateway can present newsgroups as "home pages" that can be publicly annotated (obviously, where real home pages exist, they'd be used: thus users can't help but see the FAQ (or at least a pointer to it) before posting!). The gateway will need to interpret the followup-structure of "regular" postings to present them in the annotation followup style. For public annotations of regular web pages, the annotation is sent out on the special group, and that's it. For public annotations of newsgroups, we have a couple possibilities. Clearly, we want something to go back out onto the "real" newsgroup. We can post (or crosspost) the annotation directly to the newsgroup. Unfortunately, this is liable to annoy those who don't have web-aware newsreaders and who don't like reading raw html. Alternatively, we could post the real annotation to the special annotation group, and a stripped-text version (with a pointer in a header, say "X-unstripped-pointer:") to the regular group. It wouldn't be as rich, but we could phony something up with footnotes that wouldn't look too bad. Receiving gateways would notice the duality, and just use the hypertext version. This has the advantage that it's non-intrusive, but the disadvantage that everything would be posted twice. Maybe we'll persuade people to convert fast enough that the extra traffic won't kill us.

Selection of News

In this public group annotation server experiment, we've seen that because the system was used as a discussion/newsgroup type of service, we need a way of seeing what new annotations have been made. Some people have just been scanning the server's directory of annotations, but this rather dry approach works only because of the comparatively tiny traffic. We need a new solution, particularly if we're going to read news via the same interface.

One immediate implication of the news-->annotation shift is that newsgroups become less structured: in fact, anything on the web is likely to serve as a basis for a thread of discussion. This means traditional methods of selecting newsgroups won't fill the entire need. I think we need a "selection engine" à la WAIS. (I'm not too familliar with WAIS beyond the standard documentation, so I don't know if we can just take it over as-is, or if we'll need to work on it.) We feed the selection engine pages of data specifying what we're interested in. These can be traditional headers, like newsgroups: alt.orchids, or subject lines, or the URLs of things we've written or annotated recently (our browser can keep track of this), etc. In fact, we can feed the selection engine rules that change the 'goodness' value of posts -- for instance, I'd probably give a big positive delta to anything with From: henry@zoo.toronto.edu. I'd like to note in passing that this functionality will address the Usenet Interface Project's number 1 project. (Number 3 will basically fall out of the annotation style of doing things, and 2 is covered later.)

What about moderation? Ignoring for the moment the interface to traditional newsgroups, this selection engine approach also answers the moderation problem. A moderator can edit a document (pointed to by the home page) which contains deltas or coefficients for the 'goodness' of postings. By feeding this document to the selection engine (which a user could have done automatically by the browser), the moderator's choices are enforced. Traditional yes/no moderation would involve coefficients of 0 or 1: a 1 on anything approved, and a 0 'wildcard'. Interested moderators could rank postings, or use fractional values for borderline postings, etc.

I'd also like to mention that this provides a transparent mechanism for multiple moderators. I could, for instance, tell my browser to apply ftp://redpoll.mrfs.oh.us/red/aff.slc when I read alt.fan.feynman. I would then invoke the selection rules that some motivated 'editor' creates. Others, whose tastes differ, would never be affected. Also, people could select to ignore the moderator, and see basically an unmoderated version of the group.

What about traffic? One benefit of moderated groups is the ease on the reader, another is the ease on the network. If groups with multiple versions of moderation develop (including unmoderated), the traffic load would go up. I'd reply that the server-to-server communication could also use the selection engine. A site may elect to receive only the articles approved by the 'official' moderator.

I'd also mention that perhaps a smart server could make a note of the selection rules users passed it, and perhaps involve them in its own selection criteria.

Now what about traditional newsgroups again? I'd suggest that what we do now it twofold:

This is both backwards and forwards compatable.

Multimedia

The usenet is already sort-of multimedia: there are groups like alt.binaries.pictures.orchids. But this isn't terribly transparent or automatic. The web is already multimedia, but it uses pointers to the images and other non-text items. If I were to annotate/post a picture, the current web paradigm would keep the image in one place, and send out a pointer to it. Clearly, this isn't a solution.

For sending multimedia, the first thing that comes to mind is MIME. I don't see why we can't just use it. There is one problem, which I don't know if MIME addresses (I haven't seen it mentioned, but I haven't looked hard): splitting large documents. Supposedly there are news gateways which reject or truncate articles which are too long, so the picture of a really nice Pholidota might be sent out in three or four parts. If MIME doesn't address this directly, I'd suggest the nntp-html gateway split the MIME document, post each part (noting the message-id's used), and then post a "cover" document (the "part 0" one) that 1) indicates it's a cover document, and 2) provides pointers to the pieces. Traditional folks would have much the same style of multipart things to put together (except they'd have to wash it through MIME), and web browsers would do the work automatically.

Security

There has already been some brief discussion of a way of cryptographically signing documents for authentication. I don't know of any system that can be freely used everywhere, but if we come up with one, it would go a ways towards 'guaranteeing' a moderation of a newsgroup: the moderator signs his selection document(s), and each entry has a cryptographic hash of the document in question to guarantee somebody isn't disguising articles. This needs discussion.

Multiple Servers, and Scale

One disadvantage of this proposed scheme is that, though the selection system may make newsreading faster by eliminating uninteresting documents, it is likely to greatly increase the number of postings to what could be termed minor groups (since any document could serve as the seed for a thread). On the other hand, by using the web, responsible people can rather transparently reduce the load on the net. It has always been possible to post a small article, saying the full text, or the full code, of some less popular issue is available on-line, for instance by ftp. But the additional overhead of acutally running ftp and fetching the document means none but the most motivated of readers will actually bother. To get an audience, the entire thing has to be posted. However, in the web, this indirection is merely an additional click away. A conscientious user could start off posting merely a pointer to a document in his own webspace. If it proves more popular than he anticipated, he can always stave off the load by "editing the annotation" (reposting) to include the document directly.

This scheme, where any page can serve as a focus for discussion and a channel for distribution, has many of the same advantages as a mailing list, such as ease of starting and focus. If a subject increases in popularity, the scheme scales nicely. However, if the increased traffic results in a move away from carrying all of this 'minor' annotation traffic, we can use a multiple-server scheme. Currently, reading news from two or more nntp servers is not transparent; with this web scheme, it would be: a major point of the web is the transparent access to resources on many nodes. The two extremes are simple: either a 'group' is available on the local news server, or if not, the reader could use the server suggested in the group's home page. Intermediate cases, where a group is available at a non-local but closer site, require some thought. It would be nice if this finding were all automatic, à la Archie. If we used a proxy system, where articles are always retrieved from the local server (but the local server might have to fetch them from somewhere else), then a smart server could make note of the interest, and if it continues, could modify its selection criteria to begin a newsfeed. This scheme would make everything available, but there would be network traffic only when there was a real or anticipated interest.

Far Future?

If we move in the directions I mentioned, what will the future news system look like? What I have in mind is a much less centralized, much less controlled system. Any document on the web can serve as a seed for discussion. This will make for much more focused groups of articles, but it will require a smarter selection system to find interesting ones. This system will allow a finer tune to be made on 'interesting', and if we use the smart-server approach, the news servers themselves can use this selection information to help minimize the network load. Moderation in the traditional sense (of not allowing an article to go out) will be gone, but will be replaced by a 'rating' or 'editing' scheme.

One problem would be the loss of a specific set of "core groups." The existing CFD/CFV system serves a purpose, which would be lost here. The main advantages of this system are 1) limits the number of groups a user has to be aware of, and reduces clutter and junk, 2) because many sites merely take these core groups, it helps reduce network traffic, 3) it prevents needless splintering and duplication. The first and third point would be addressed with the selection system. However, there would still be a need for one or a few respected official lists. These could serve as a starting point and, if we don't come up with a good smart-server approach, they could serve as the basis for the server-to-server selection criteria.

Conclusions

The existing network news system does a very good job. However, recent innovations and developments, including multimedia, selection engines, and the integration of resources into one web, provide ways we can improve the news system. I have suggested a system that can initially sit alongside the exiting news system, providing users who switch with all the information in the old system, and all the advantages of the new. It would offer a way of being more selective in article selection, it would naturally "thread" items into their natural discussions, it would be transparently multimedia, and it would link into the world-wide web of networked resources. New "groups" of related articles can more easily be spun off, giving better focus, but the selection mechanism prevents this blizzard from blinding a user. A rating scheme would supplement the current moderating scheme, which would allow not only a possible scale of articles, but would cater for differing levels of interest. As the system develops, "smart" news servers could use their users' selection criteria to form their own, used when communicating with other servers. This would limit network traffic to that expected to be used, while still making everything available to the user.

Discussion

I will post this to several newsgroups involved in this discussion: news.future, comp.infosystems.www, comp.infosystems.wais, comp.mail.mime, and comp.multimedia. Followups will be directed to the first two, as they are the ones more than casually involved. While it is possible to annotate this document using NCSA's experimental server, the annotations wouldn't be seen by most of the network community. Therefore, nice though annotations may be, I'd have to ask you to post instead. (Particularly because anyone annotating is likely a web supporter, and I'll need as many supporters as possible when I let this loose on the Usenet world!)

Frederick.