Fighting Spam and Viruses at the Server, Part II
- Content Tests
- Popular Content Tests
- ...And That's It for Content Filtering
In the first article in this series on "Fighting Spam and Viruses," we looked at the dangers lurking in every user's inbox, and enumerated tests you can perform on an E-mail message's envelope to determine whether it is spam.
However, these techniques are not the only tools in your spam-fighting arsenal. Far more powerful, in many ways, are the tests that you perform on an E-mail message's actual contentor, more precisely, the tests that specially designed software performs for you.
Content Tests
Why resort to testing message contents? Privacy issues plus the sheer time it would take to hand-inspect every piece of mail passing through your domain are the two major reasons to use automated tools when testing what's inside an E-mail to see if it's spam.
It isn't always possible (or practical) to make a spam diagnosis on the basis of the envelope. Few domains use SPF entries yet, and someone's being on a DNSBL may or may not mean they did anything wrong, as we discussed in Part I. DNSBLs are also vulnerable to Distributed Denial of Service (DDoS) attacks, as we saw last year when two DNSBLs were forced to shut down. A strategy that relies exclusively on DNSBLs leaves you vulnerable when this happens, either temporarily or permanently.
On the other hand, opening up someone else's E-mail can also get you in a huge amount of trouble over privacy violations. It exposes you to invasion of privacy lawsuits for one thing, unless you are the administrator for an organization that explicitly warns its users that their mail is monitored.
Programmatic content testing does have a downside, however. Content analysis is more expensive, resource-wise, than running a few quick checks on an envelope. You can check the envelope without actually downloading the mail, but checking content requires both the bandwidth and disk space needed to pull the mail onto your server. Then, there's the processor time required to perform an analysis of the mail headers (yes, they fall under content), and the message body.
These factors can add several seconds to the time it takes to process a single mail item. On a mail server that processes hundreds of thousands of messages a day, these tiny extras add up fast. Soon, you can find yourself investing in more hardware to distribute the load of your anti-spam solutionrevealing yet another real cost of spam.
However, the thoroughness of content-filtering allows you to make diagnoses with more confidence, since you can look for features that are symptomatic of spam, compare the body of the E-mail to databases of known spam, and even use statistical techniques to train a "learning engine" how to recognize spam based on recipients' individual tastes. Ultimately, the costs of hardware and bandwidth pale when compared to the cost of lost productivity, frustration, and the occasional virus problem or social engineering spam breach.