Other issues in this category (25)
Collecting mail garbage is a tough job!
Many people know how an anti-virus works—everyone’s heard of signatures, heuristics, behavioural analysis, etc. The concept of an “anti-spam” is also familiar to many. But, how does this protection component work?
Let’s begin with the fact that the work of spammers fundamentally differs from the activities of virus writers. If the latter must create programs that have nothing in common with the programs whose details have been recorded in virus databases and in the rules governing how behavioural analysers operate, the job of spammers is to make messages look as legitimate and trustworthy as possible. And, although you can use a list of allowed processes to restrict the flow of malicious programs onto PCs, when it comes to e-mail, that’s not possible for obvious reasons (although you can limit the flow of messages coming from specific domains for certain groups of employees).
Naturally, most spam is easy to distinguish from legitimate e-mail messages, but what if we make a mistake because we can’t tell the difference between a real penalty notification and a fake one? And how can a program that a priori can’t be compared with the capabilities of our human brains do just that?
The simplest (and the most common) method used in anti-spam involves fuzzy checksums and Bayesian filtering. Basically, for all e-mail messages, a checksum is calculated using special formulas. That checksum is compared with the checksums in the anti-spam database. This method is simple, which, of course, is its biggest advantage, but it has far more disadvantages.
The main disadvantage is that this method cannot be used to filter most spam. Therefore, in addition to it, you must use other filtration methods. We’ll talk about them later.
The second disadvantage is that it requires persistent training. Since the anti-spam database consists of the checksums of specific spam samples, for spam filtering to be effective, new samples of unwanted e-mails must constantly be added to it. This method requires persistent training and/or frequent updating. Consequently, it’s complicated for anti-spam to work in networks that are isolated from the Internet where frequent updates are unavailable for obvious reasons.
The third disadvantage is graphical spam. Have you ever noticed how much spam comes in the form of images? Graphics cannot be calculated with checksums, so an anti-spam that uses only Bayesian lists lets those messages through.
So, the Bayesian method requires the help of additional filtering methods. The most common one is DNSBL (Domain Name System Blacklists)—lists of e-mail addresses/domains known to send spam, i.e., they are completely analogous to the blacklists and whitelists of e-mail addresses available for editing in Dr.Web Security Space, but they are maintained by some Internet service. This is a uniquely powerful tool, but it has its shortcomings… The services that maintain DNSBL databases are completely independent from anti-spam vendors. Accordingly, anti-spam vendors have absolutely no control over the quality of these lists—the user gets a bomb “under their bonnet” that is set to go off. The point is that spam is added to the databases after users contact the service’s owners or after it is detected by the service itself. But, as you know, sender data can be tampered with, or you can accidentally become a sender of spam yourself. You could find yourself in a situation where you can’t receive e-mails from someone you know because that person was blacklisted and doesn’t know it because they use a different anti-spam. And, by the way, getting removed from DNSBL databases often involves a fee—their administrators need to make a living somehow.
We will talk about an alternative approach to spam filtering in upcoming issues.#Anti-spam #Dr.Web_technologies #security
Look “under the bonnet”: the most interesting things can be found there! Learn about what’s under the bonnet of Dr.Web anti-spam here.