W3C Mail

Spam filtering options at W3C

Various spam filtering options are available for use on W3C mailing lists. If you maintain a mailing list that has a spam problem, you should choose one of these methods to make the spam go away.

If you are subscribed to a W3C list that has spam problems, you should bug the list maintainer to have the list's configuration fixed.

Contents:

Recommended configuration for lists

The recommended configuration for lists that need spam filtering is to have the archive approval system enabled for the list. This system trapped about 17,000 attempted spams to our lists in October 2002. (those are messages where the poster never bothered to have them approved)

You only need to read further if you want to know about the various systems and how they work.

Archive approval system

The archive approval system is probably the most effective way to prevent spam from being distributed to our lists.

The way this system works is: the first time someone posts to a list, they are immediately sent back a response that asks them for approval to archive their message on our site. These responses are generally ignored by spammers, and regular participants just need to submit a simple web form giving us permission to archive their message.

The main intent of this system is to make it clear to posters that their messages are going into our online archives, but it also happens to be very effective at reducing spam.

The W3C systems team recommends using this system on any lists that have spam problems.

When this system is enabled for a given list, the accept lists are normally turned off; i.e. anyone in the world can post to the list if they go through the archive approval challenge.

In rare cases, e.g. w3c-announce, it may be desirable to enable more than one of these systems, for example: use accept lists to restrict posting to a specific list of people (e.g. the W3C Comm Team), and also enable the archive approval system as a spam filter. If you request the archive approval system be set up for a list and also want the accept lists enforced, be sure to specify that in your request, since the default is to turn off the accept list enforcement.

Accept lists

"accept lists" generally only allow people who are subscribed to lists to post to them. Many of our lists also have an "accept2" list which is a list of alternate addresses that are allowed to post to the list. (this is useful for people who have many different email addresses and can't keep their identities straight ;)

At W3C we also have master accept lists which are lists of all the addresses on all our accept lists, divided into three categories:

  1. all addresses authorized to post to team lists (accept.team)
  2. all addresses authorized to post to team and member lists (accept.member)
  3. all addresses authorized to post to any of our lists (accept.all)

Lists can be configured to use one of these master accept lists in addition to its local accept list(s). These master accept lists are generated automatically every 4 hours by gathering entries from all the other accept lists.

This system generally works fairly well, but:

Therefore, any lists with spam problems may be better served by the new archive approval mechanism.

Spamassassin

spamassassin is a popular heuristic-based spam labelling system. It has a huge set of tests that it runs on each incoming message; for example, it looks for:

Each of spamassassin's tests has a score, and if the sum of all the scores exceeds a certain threshold the message is labelled as spam.

If spamassassin is enabled for a given list at W3C, messages that exceed the default threshold (5.0) are not distributed to the list; instead, they are labelled by spamassassin and forwarded to the list maintainer with this header added:

X-Diagnostic: Probable spam caught by spamassassin

This might be a reasonable thing to use on lists where neither the archive approval nor accept lists are wanted, but we generally don't recommend it for use on our lists. (partly because there is currently no easy way for list maintainers to resend messages that were incorrectly labelled as spam to the list without having the message trapped by spamassassin again.)

Spamassassin is very useful as a general spam filter for personal email, however.

Removing spam from archives

Annospam allows (only Staff for the time being) to remove messages that have been archived. It is intended for use in removing mail identified as spam, NOT for archive editing in general. Archive edits should be requested in accordance with the W3C Archive Editing Policy.

Forgery prevention

We have deployed SPF (sender policy framework) for forgery prevention; it rejects messages that have forged return-paths according to DNS entries published by domain owners.

One problem with the archive approval and accept list systems is that they use the 'From:' line of email to determine who sent a message, and that header can be easily forged as well.

If that problem becomes more widespread, we may modify the archive approval system to look for specific From: headers (e.g. including a full name), or to allow individuals to tell us to only accept mail that is PGP/GPG signed, or that contains some specific phrase in the body. We have not yet allocated staff time to work on that.

(also, SPF may eventually come to be applied against headers other than the return-path.)

Email address obscurity

Once in a while we receive a request to obfuscate the email addresses on our site, or in our archives.

I (Gerald) am against this, because I don't think it really solves the problem, and it inconveniences real people trying to get stuff done.

It is very difficult to keep an email address secret for any reasonable period of time; even our team-internal lists invariably get exposed to the public when someone forwards a message somewhere else that has a public archive.

Mark Pilgrim has written an excellent article on this subject, Club vs. Lojack solutions.

In the case of email obfuscation, harvesters never go away, they just disproportionately affect those who don't obfuscate, until enough people obfuscate that the harvesters get smarter, everybody's wasted a lot of time, everybody's email is still getting harvested, and we're all back where we started.

See also: Email address obfuscation in mailing list archives article in the systeam blog.


Valid XHTML 1.0! Gerald Oskoboiny
Last modified $Date: 2010/06/04 19:04:23 $ by $Author: gerald $