Security

Spam - Advanced

 

Anti-Spam Technologies

 

    Mail Server Blacklists

      Good: Easy-to-implement. Blocks incoming spam before processing by virus scanners and other anti-spam techniques, thus reducing the load on the email servers.
      Bad: Clumsy Incomplete. Inaccurate. Subject to human error. Often based on an individual's opinion.
      Role: A basic first pass filter to eliminate up to 50% of spam.

      Lists of email servers, either known to be used by spammers, or known to have security weaknesses (open relays) that would let spammers use them to relay mail. Maintained automatically or by volunteers. The source (IP) addresses from which email originates is compared against one or more blacklists. Every single email message which appears to originate from a matching source is blocked. Some Internet Service Providers (ISPs) subscribe to such blacklists, and automatically refuse any mail from servers on them.

      This anti-spam technique works poorly for several reasons:

      1. They're incomplete.
        Blacklists never contain more than about half the servers from which spam originates.
      2. They're ineffective.
        Spammers change sources addresses often and extremely quickly. Spammers are also usually adept enough to forge their mail headers, masking their true source address, and setting up an innocent third party to take the blame, and be blacklisted unfairly. Worse yet, spammers sometimes use these lists to find vulnerable servers off of which to relay their mail.
      3. They're inaccurate.
        It's easy and common for an entirely innocent address to find its way onto a blacklist. You won't know you are on a blacklist until you happen to discover you're not receiving all the email you expect. And then it is difficult to have your address removed from a blacklist. An entire ISP's network may become blacklisted by unwittingly host a spammer who exploits a subscribers' server, causing all subscribers' email to be blocked.
      4. They block legitimate mail.
        David Nelson, a senior industry analyst at Giga Information Group, says a recent study found that MAPS blocked 24% of spam with 34% false positives. Legitimate and important Email Messages are lost.
      5. They're annoying.
        Email account holders do not have control of their own email. Email you actually want to receive may be blocked, or you may be prevented from communicating with someone wrongly blacklisted. Unless the affected servers are yours, there is little you can do about it except notify the affected party by telephone, and wait for days or weeks as they attempt to clear their name and get de-listed.

      The Spam Problem: Moving Beyond RBLs - an argument against, and alternatives to, Blacklists, by Philip Jacob

     

    Signature-Based Filtering

      Good: Rarely blocks legitimate mail.
      Bad: Catches only 50-70% of spam.
      Role: A first-pass filter on big email services.

      Signature-Based filters work by comparing incoming email to a database of known spam email messages, and filtering those that match. In order to tell whether two emails are the same, the systems calculate "signatures" for them. One way to calculate a signature for an email would be to assign a number to each character, then add up all the numbers. It would be unlikely that a different email would have exactly the same signature.

      The way to attack a signature-based filter is to add random stuff to each copy of a spam, to give it a distinct signature. When you see random junk in the subject line of a spam, that's why it's there, to trick signature-based filters.

      Signature-based filters have never had very good performance, because as soon as the filter developers figure out how to ignore one kind of random insertion, the spammers switch to another.

     

    Rule-Based (a.k.a. Heuristic and Content) Filtering

      Good: The best catch 90-95% of spam. Easy-to-implement.
      Bad: Static rules. Relatively high false positives.
      Role: Easy server-level solution.

      Applies a set of rules to each message, looking for patterns that indicate spam - specific words and phrases, excessive use of uppercase and exclamation points, malformed headers, dates in the future or the past, etc. - and assigns a rating which reflects the likelihood of the message being spam.

      The performance of rule-based filters varies widely. The simplest just reject any email that contains certain "bad" words, and are easy for spammers to outwit by using minor variations in spelling. On the other hand, sophisticated rule-based filters can be quite effective, catching 90-95% of spam.

      The main disadvantage of rule-based filters is that they tend to have high false positive rates, often as high as 0.5%. (A trained Bayesian filter's false positive rate would be less than a tenth of that.)

      Another disadvantage is that the rules are static. When spammers learn new tricks, the filter's authors have to write new rules to catch them. And because rule-based filters are static targets, spammers can tune their mails to get past them. Sophisticated spammers already test their mails on popular rule-based filters before sending them. In fact, there are sites that will do this for free.

      The advantage of rule-based filters over Bayesian filters is that they're easy to install at the mail server level. Bayesian filters require users to train them by telling them when they misclassify an email, so running one on the server is a little more complicated (but probably worth it).

     

    Bayesian (a.k.a. Statistical) Filtering

      Good: Catch 99% to 99.9% of spam. Low false positives. Adapts automatically as spammers change techniques.
      Bad: System has to be "trained".
      Role: Best current solution for individual users.

      Bayesian filters are the latest in spam filtering technology, only becoming widespread in 2003.

      Bayesian filters recognize spam by examining the words (or "tokens") they contain. A Bayesian filter starts with two collections of mail, one of spam and one of legitimate mail. For every word in these emails, it calculates a spam probability based on the proportion of occurrences in the two "samples". For instance, for one individual's email "profile", "Guaranteed" could have a spam probability of 98%, because it occurs mostly in spam; "This" could have a spam probability of 43%, because it occurs about equally in spam and legitimate mail; and "deduce" could have a spam probability of only 3%, because it occurs mostly in legitimate email.

      When a message arrives, the filter collects the 15 or 20 words whose spam probabilities are furthest (in either direction) from a neutral 50%, and calculates from these an overall probability that the email is a spam. For example, not every email that contains the word "free" and "cash" is spam. The Bayesian method would find the words "cash" and "free" interesting but it would also consider the name of the business contact who sent the message and thus classify the message as legitimate. (As opposed to keyword checking that classifies a mail as spam on the basis of a single word.) Bayesian filters are extremely accurate, and are particularly good at avoiding "false positives", legitimate email misclassified as spam. This is because they consider evidence of innocence as well as evidence of guilt. A Bayesian filter is unlikely to reject an otherwise innocent email that happens to contain the word "sex", as a rule-based filter might.

      A Bayesian filter is more difficult to trick than a keyword filter. By learning from new spam and new valid messages, the Bayesian filter constantly evolves and adapts to new spam techniques. For example, when spammers started using "f-r-e-e" instead of "free" they succeeded in evading keyword checking until "f-r-e-e" was also included in the keyword database. On the other hand, the Bayesian filter automatically notices such tactics; in fact if the word "f-r-e-e" is found, it is an even better spam indicator. Another example would be using the word "5ex" instead of "Sex". A spammer who wants to trick a Bayesian filter can either use fewer "trigger" words (i.e., words that usually indicate spam such as "free", "Viagra", etc), or more words that generally indicate valid mail (such as a valid contact name, etc). Doing the latter is impossible because the spammer would have to know the email "profile" of each recipient. Using neutral words, would not work since these are disregarded in the final analysis. Breaking up spam words (using "f-r-e-e" instead of "free") will just increase the chance of the message being spam, since a legitimate user will rarely write the word "free" as "f-r-e-e".

      Bayesian filters adapt to the user. Because Bayesian filters learn to distinguish spam from legitimate mail by looking at the actual mail sent to each user, they automatically customize themselves to each individual's message content and email style.

      Bayesian filters are multi-lingual and international. A Bayesian filter, being adaptive, can be used with any language. Most keyword lists are only available in English.

      The disadvantage of Bayesian filters is that they need to be trained. The user has to tell them whenever they misclassify a mail. Of course, after the filter has seen a couple hundred examples, it rarely guesses wrong.

     

    Challenge-Response Filtering

      Good: Stops 99.9% of spam.
      Bad: Rude. Delays or deletes legitimate email.
      Role: Individual users.

      When you get an email from someone you haven't had mail from before, a challenge-response filter sends an email back to them, telling them they must go to a web page and fill out a form before the email can be delivered.

      The advantage of challenge-response filters is that they let through very little spam. The disadvantage is that they're rude. By using a challenge-response filter, you are expecting the extra work of keeping your inbox free of spam to be done by the people who send you mail.

      The other disadvantage of challenge-response filters is that much legitimate mail will either be lost, or delayed until the sender responds to the challenges, which may make the message too late to be useful. Senders may choose not to reply to the challenge, and the email they sent you will be lost.

      Challenge-response filters can be suitable for light email users, who only get email from a few different addresses. They might also be good in combination with other kinds of spam filters. For instance, you could challenge a message blocked by a Bayesian filter, to help detect and correct false positives.

     

      Other Filtering Techniques

      Many systems combine other techniques as well as those above, including checking whether

      • the sender is a known spammer
      • the sender domain is invalid (reverse DNS lookup)
      • the sender is providing misleading information in the message header
      • the sender is sending the email to large amounts of users
      • the message is using scripts
      • the message is using image tags to track if the email was opened
      • the message contains an image with a high percentage of flesh-tones

 


Realtime Blackhole Lists (RBLs)

 

    Blacklist are databases of Internet addresses used in the configuration of email systems to block spam from known offending sources. The descriptions below from Mail Abuse Prevention System (MAPS) illustrate how this technology is used to block inbound spam. Email servers are configured to compare email sender addresses against these lists, and drop (delete) any messages from senders that are on the list.

      MAPS RBL (Realtime Blackhole List) Used for blocking IP addresses of known sources of spam. A list of IP addresses that have been shown to send spam as well as those who support or allow the sending of spam

      MAPS DUL (Dynamic User List) Used for identifying dynamically assigned IP addresses which should not be a source of email. When they are, it is often due to a virus or other type of illegal or unknown use of the end-users resources. Mass emailers often use compromised machines, often home computers, to directly connect to the mail servers of their targets, bypassing the usual ISP gateway. The victim of this compromise is often unaware that this illegal activity is taking place on their computer.

      MAPS RSS (Relay Spam Stopper) is a list of list of IP addresses of known insecure ("open relay") mail servers, used for identifying mail servers that could be used to relay spam and other email for third parties. A third-party relay, also known as an "open relay" or "insecure relay", is a mail server that will route mail for any third party to any other third party, no questions asked. Spammers often hunt for and abuse open relays in an effort to cover their tracks because they know their spam is unwelcome and unwanted.

      MAPS OPS (Open Proxy Stopper) is a list of IP addresses of that have been used as an open proxy to relay spam, used for identifying IP addresses that provide proxy access to email transmission for third parties. An open proxy is where any port, other than the SMTP port (port 25), on a machine is used to route mail for a third party to another third party, no questions asked. Spammers hunt for and often can create open proxies and the victim of this compromise is often unaware that this illegal activity is taking place on their computer.

      MAPS NML (Non-confirming Mailing List) for identifying IP addresses of operators who do not fully confirm the email addresses in their lists. A list of IP addresses that are known to be the sources of mailing lists that do not confirm that the owner of the email address in their mailing list has, in fact, granted their permission to be included on the mailing list. Only by having a fully verified opt-in process can one be certain that the person signing up for the mailing list is, in fact, the person who rightfully controls the email address in question.

      Using Real-Time Blackhole Lists For Filtering Email, by John Young

     


    How to determine if your email server is Blacklisted

     

      According to their advertising "MAPS carefully maintains all of its lists. All IP addresses included on our lists are carefully investigated and confirmed to be sources of spam. Before any listing, we try to contact the owner of the address and correct the issue. Addition to our lists is only as a last resort after an agreeable resolution cannot be made. As a result of our unique approach, legitimate email is not lost.

      However, not all blacklisting organizations are so careful, and email servers can end up being blacklisted due to misconfiguration, carelessness, or by accident. Before adding an email server to their list, the blacklisting organizations typically, though not always, will attempt to contact the "postmaster" at the offending email domain. If they don't, it is left up to a sender to discover that his/her messages are not getting through to their intended recipient(s), because the sender's email domain (the "@domain.com" part of email address) matches a entry in the blacklist in use on the recipients email servers, and is being dropped (deleted without notice).

      If this happens to you, the following resources will help you determine if you're listed on any blacklists, determine which lists/organizations have the email domain blacklisted, and provide contact information for the blacklisting organization, so that you can request removal from the list:

      (Most of these utilities require that you know the IP address of your domain. To determine the IP address of your domain use SamSpade by entering your domain in the first field and clicking "Do Stuff".)

       

      Search multiple blacklists simultaneously:

       

      Search individual blacklists belonging to these organizations:

     


    Verifying your email server is not an Open Relay

     

     


    How to determine the true source of a Spam message

     

     


    Anti-Spam Resources

      UXN Spam Combat provides gateways to check some MAPS, Osirusoft, and Spamhaus blacklists

      Use the Spam Tester from Declude to test the effectiveness of your email system's anti-spam solution.

     


Copyright © 2003 Scientis       Privacy Policy       Terms, Conditions, & Notices