Not that I am really interested by research&dev about spam filtering, but this American Scientist article by Brian Hayes is quite interesting from a cultural point of view. It basically describes spam as a social and economic phenomenon rather than a technological one and take an an immunological metaphor to explain it ("where the contest is between a host organism and pathogens or parasites, and where both sides have to adapt and evolve in order to survive").
"If e-mail containing the word "Viagra" is blocked, there are other ways of getting the idea across, including synonyms and circumlocutions ("sildenafil citrate," "impotence meds," "the little blue pill"). An adaptive filter will soon flag these terms as well, but by then the spammer can move on to other options. For some kinds of variation—such as obfuscatory misspelling along the lines of "V1@gra"—computational methods could automate the generation of random variants. (...) So how many ways can you spell Viagra? The question is addressed directly by an amusing Web page, created by Rob Cockerham of Sacramento, whose title announces: "There are 600,426,974,379,824,381,952 ways to spell Viagra." (...) When I first noticed spam with aberrant spellings, I assumed that someone out there in the murky world of spam service providers had written a program to generate random variants (...) I still suspect that such random-spelling generators exist in the spam world, but the evidence of my own inbox suggests they are not widely used. The telltale mark of their use would be a peculiar abundance of hapax legomena—the lit-crit term for words that appear only once in a corpus"
Why do I blog this? cultural aspects of "teh web".