In the anti-spam community we spend an awful lot of time pouring over headers, writing regular expressions to catch "ratware", and training Bayesian filters to do content analysis.  But, while we gripe about spammers in our mailing lists and blog posts, we don't often describe their operations in detail: spam, for our Internets and purposes, pops out of some ether when it arrives at our mail servers.  This strikes me as a poor foundation for reasoning about the problem.  So lets stop talking about spam for a second, and talk about spammers.

 Spammers have a community, no less than anti-spammers do.  It is present in underground IRC channels, peer networks over ICQ, and web-based forums such as <a href="http://www.specialham.com">SpecialHam</a&gt; (where I lurked periodically and learned most of the following).  Spammers have an <i>infrastructure</i> — there are dozens of players involved in getting the latest offer for Mr. Wiggly pills to your mailbox.  This infrastructure has specialization by roles.  Just like the anti-spam communityhas  a handful of deep-thinkers like Paul Grahm, who wrote the seminal <a href="http://www.paulgraham.com/spam.html">A Plan for Spam<.a> essay which kicked off the modern Bayesian filter experiment, there are deep thinkers of spam.  A relatively tiny portion of both communities has the technical skill necessary to develop tools to further their interests, and these tools are both shared and sold.  A still small but larger portion of each community is expert with techniques to make maximum use out of their available tools — for example, writing regular expressions for <a href="http://spamassassin.apache.org/">SpamAssassin</a&gt;.  The majority of the community has no special level of technical skills and seeks turn-key solutions where you click two buttons and go.

 Lets talk a little about the types of tools the spam community has a need for.

 scrappers — A scrapper is an Internet spider which collects email addresses.  It scans publicly available web pages, newsgroups, forums, etc to find more targets for spam.  In the anti-spam community, we've advised people to be circumspect with publishing their email address and to use obfuscation tricks like patrickmckenzie@DELETE.example.com.  It should come as no suprise that many scrappers have adapted to these tricks, for example by using regular expressions which recognize / (AT) / as equivalent to /@/.  There are other ways of getting email addresses which do not involve scrappers, which are described below.

 verifiers — A verifier takes as input a list of email addresses and returns as output a sublist of those email addresses which can actually be delivered to.  There are a variety of methods for accomplishing this, and many of them involve actually mailing the inboxes in question.  Many mail servers now will refuse connections if you deliver too many messages to invalid inboxes in a particular period, which makes this technique risky from the spammer's perspective (it will cost him use of his IP address — more on this later).  As a result, most verifiers tend to be custom-built pieces of software which are domain specific and, if they target a large domain, very valuable (prices range in the hundreds of US dollars).  One example of a strategy for verifying email addresses in a very valuable domain is, for http://www.aol.com, creating multiple dummy AIM accounts and, over the period of a week, noting which addresses on the list come online from an AOL clients.  AOL of course has countermeasures in place, and these programs generally don't have a very long effective lifespan.  They also don't need it, as they're largely produced in Eastern Europe where the prospect of thousands of dollars of payoff for selling a successful script can motivate an awful lot of talent.
 

mailers — Then there are mail agents which actually send you the mail.  A spammer has several operational requirements which are not dissimilar to those of a person running a large mailing list: his software must purge bounced addresses from his list, generate an incredibly large number of emails in a short amount of time, and so forth.  There are other requirements imposed by being in the "biz": the mailer cannot be detectable by spam filters which look for telltale signs of "ratware" (the anti-spam community's derisive label for spamming software).  For example, ratware which immitates a genuine email program (such as Thunderbird) but which places its headers out of order will get its messages almost automatically bounced by systems employing SpamAssassin or similar rule-based techniques.  The mailer must also avoid anti-spam countermeasures such as RBLs (blacklists of known spammer IPs), which generally entails using rotating IP addresses from "bullet-proof hosting" (see below) or distributing the mailing across a botnet (see below).

 botnet — Not technically a piece of software, a botnet is a network of computers which have been subverted by a trojan, virus, or other security exploit.  A computer so afflicted is called a zombine.  Botnets are generally controlled over dedicated IRC channels.  Spammers generally buy access to botnets from virus writers or from others in the spam community who take existing viruses and modify their payloads to include code capable zombifying the machine.  There exist a variety of open source tools to include arbitrary payload with a given exploit, greatly decreasing barriers to entry to this market: an example is <a href="http://www.metasploit.com/">Metasploit</a&gt; (which is, incidentally, aimed at "white hat" penetration testers — there are much more nasty such packages lurking in dark corners of the Internet).

Stay tuned for our next installment, where we cover money, the driving force behind the spam community. 

広告