graphic with four colored squares
Cover page images (keys)

Fighting Spam with Open Source Tools

MESDA OSSUG: Open Source Solutions Users Group

October 17, 2007

Westbrook, Maine

Ted Guild

Head of W3C Systems Team

These slides:



W3C - Open Standards Organization

The World Wide Web Consortium (W3C) is an Open, International Standards Organization for the Web comprised of Technical Staff, numerous Member Organizations, and scores of individual Invited Experts.

Email correspondence with such a diverse and distributed participation group is a vital part of the Consortium's operations. Besides regular contributors we accept public comments on our Technical Specifications during last call periods in particular and in general welcome feedback from the Web Community.

Mailing Archives

W3C Jigsaw

We run many hundreds of Mailing Lists with thousands of recipients and portions publicly archived. As such we are particularly attractive to spammers.

Quick Background on Spam and Virus

There are numerous writing on this subject, a good high level article on the causes, history, attempts to legislate and counter measures is available at The New Yorker.

Email Address Harvesting

It is often only a matter of time before any email alias is harvested by spammers. No need to make it easy for them though, increase their cost.

"SMTP Consortium" - Hugo Haas

Disclaimer: W3C is not responsible for the mess that is Email.

Partial View of our Mail Processing:

smtp flow diagram

Arrange your Defenses Wisely

Reject Stats for a Week

# msgs%agewhy
1232103.55% used IP address in HELO/EHLO
62014417.87% used our name in HELO/EHLO
174361550.24% unrouteable addresses (no such address at our site)
314660.91% envelope sender on local blacklist
130440.38% header-from address on local blacklist
610.00% local addresses that do not send mail
47190.14% sender IP address on local blacklist
74590.21% rejected due to unencoded 8bits in Subject
00.00% forgeries rejected according to SPF records
1470.00% forgeries rejected based on header patterns
405081.17% viruses etc rejected according to filename extensions
1185963.42% viruses/trojans/worms rejected (using clamav)
58367716.82% spams rejected with SA score > 7
328664694.71% total

Choose your Public Mail Server Carefully

The Mail Exchange (MX) for your organization available to the public can and perhaps should be completely different from what it's users connect to.

Enter Exim. Unlike mail servers (Mail Transfer Agent - MTA) of the past it is a bit more wary.


Besides reducing spam within your in-box you want to protect your and your organization's image.


Let's face it, some mail clients and operating systems are more susceptible to viruses than others. Designing intentional hooks for email to interact with the rest of the system comes at extreme risks. Coding limitations to these interactions will be an uphill battle. There should be zero trust for something anybody can send you. People are often duped, aka Social Engineered, into opening attachments and willingly forward cutesy or humorous attachments to friends and family.


	  This virus works on the honor system. Please randomly delete
	  some of your files and forward this to everyone you know.

-- UNIX Joke Virus

There are many free (as in source and free as in beer) and commercial anti-virus systems operating at the server or personal computer level. Some of the prominent ones are contrasted in LinuxWorld Fight Club.

Use one at your mail server and have it update it's signature database regularly. Long before LinuxWorld's rating, we've been rather pleased with ClamAV .

Using data to fight spam
White, Black and Gray-listing

In addition to the data available on the net take advantage of data derivable from your own mail, this is the information age after all.

In addition to these lists approaches there are honeypots already mentioned and Bayes filtering. Bayes is statistical analysis based on categorization of contents (message and headers) of bodies of known spam and ham (legitimate email) so that a spam probability can be assigned to incoming mail. Bayes can be part of a heuristics system on your mail server that is regularly fed ham from your outgoing mail and spam from your honeypots.

Probable Spam

Not all mail easily falls into ham or spam. If there is any doubt it is preferred to categorize the uncertain and let the end user decide for themselves.

Ever-evolving spammer techniques and counter-techniques

It is a vicious circle of spammers countering the counter-techniques. Looking to maximize their distribution some regularly run their spams through gauntlet of counter spam systems to see how they fair.

Bayes filtering was so effective (mid to high 90s percentage) that spammers send many millions of innocent messages without any product uri for the sole purpose of polluting bayes databases. DSPAM's algorithm is less susceptible to bayes pollution.

PDF and image attachments being the body of the email was a new technique for awhile this last Summer.

Unless they want to be inundated with Spam organizations pretty much have to have postmasters and system administrators who stay somewhat abreast of emerging trends and/or involve consulting or commercial resources to assist with defenses. it's free and there are webcasts for those who cannot be present physically.

Do not go it alone, long gone are the days administrators and developers can cost effectively maintain their own concocted filters.

Consider Spam and Virus systems that auto-update and take advantage of Open Source Developer and User Community.