Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
Spam The Internet Your Rights Online

Armoring Spam Against Anti-Spam Filters 511

moggyf points to a BBC article about how spam can be successfully tweaked to slip past current filtering methods, excerpting "To finding out how to beat the filters Mr Graham-Cumming sent himself the same message 10,000 times but to each one added a fixed number of random words. When a message got through he trained an 'evil' filter that helped to tune the perfect collection of additional words." iluvspam adds "It's an interview with POPFile author John Graham-Cumming that summarizes his talk at the recent MIT Spam Conference. You can still listen to the technical details here (choose the Afternoon 1 session, he starts about 75 minutes in)."
This discussion has been archived. No new comments can be posted.

Armoring Spam Against Anti-Spam Filters

Comments Filter:
  • by rmohr02 ( 208447 ) <mohr.42@osu. e d u> on Wednesday February 04, 2004 @11:21AM (#8179656)
    POPFile [sf.net], maintained by John Graham-Cumming, is the best spam filter I've used. There may be small flaws with the fundamental concept of Bayesian filters, but POPFile still blocks all my spam.
  • Great (Score:3, Interesting)

    by Polkyb ( 732262 ) on Wednesday February 04, 2004 @11:23AM (#8179684)

    I don't mind him trying to defeat the filters, if it comes up with a method of improving them, but the BBC should be shot for including the words that made it through

    Guess which words all tomorrows SPAM will contain...

  • by Channard ( 693317 ) on Wednesday February 04, 2004 @11:24AM (#8179687) Journal
    Mozilla's filtering catches most spam for me, but some gets through. However, the only one that actually fooled me was quite a sneaky one - headed RE: Question from E-Bayer or whatever the actual subject is where you E-Bay something. Given that I sell on E-Bay, the spammers must have taken a gamble that enough people would read the subject and deem it worth looking at.
  • by Anonymous Coward on Wednesday February 04, 2004 @11:25AM (#8179694)
    I hate to see mainstream media coverage of this practice. I have started to get a lot of these spams lately.

    Typlically they include a large image at the top which is the entire intended content of the image and then a bunch of dictionary words at the bottom. It's basically impossible to filter these out unless you filter out ALL HTML e-mail because they don't contain any typical spam text.
  • by aussersterne ( 212916 ) on Wednesday February 04, 2004 @11:27AM (#8179711) Homepage
    I have received piles of these recently. The names, item, item number, and amount change randomly, but it is always structured like a legitimate eBay message. I'm nervous about adding them to my bayesian filtering because I don't want to miss any eBay messages. I, too, sell a lot on eBay...
  • Re:nice name (Score:4, Interesting)

    by JohnGrahamCumming ( 684871 ) * <slashdot@jgc.oERDOSrg minus math_god> on Wednesday February 04, 2004 @11:29AM (#8179737) Homepage Journal
    Yes, that's a constant problem for me (and anyone else named Cumming or Cummings in the world). For example I can't get a Hotmail email account because of my name, but I did manage to sign up [usethesource.com] an account using the name Ivana Watch-Teens-Give-Head :-)

    John.
  • Re:Ok fuck it (Score:3, Interesting)

    by cperciva ( 102828 ) on Wednesday February 04, 2004 @11:30AM (#8179743) Homepage
    You do realize you've just comitted a pretty serious Federal crime, don't you?

    He hasn't, actually -- those laws don't apply extraterritorially, and Tom's in Canada.
  • by Threni ( 635302 ) on Wednesday February 04, 2004 @11:31AM (#8179746)
    What, exactly, is wrong with the `make it computationally expensive to send email` solution Microsoft and others have proposed?
  • Educate the people (Score:2, Interesting)

    by Theresa1 ( 748664 ) on Wednesday February 04, 2004 @11:31AM (#8179747) Homepage Journal
    When I was on holiday in tunisia, we were bothered quite a lot by trinket salesmen, who would not take no for an answer. Initially we had a lot of difficulty getting rid of them because my kids kept wanting me to buy the trinkets. <praying hands> plleeeese !!!!!!!! can we have one ? </praying hands>. Eventually even my kids got fed up with them, and a united front defeted them. Anyway my popint is, eventually the whole world will wise up and just ignore spam. There will bne no incentive for companies to pay the spammers, and they'll just go away. It might take a while though.
  • Why bother? (Score:2, Interesting)

    by nakedbonzai ( 618338 ) on Wednesday February 04, 2004 @11:31AM (#8179751)
    I am still perplexed as of why a spammers wants to bypass someone's spam filter. Obviously, the person will simply delete any spam that gets through. They won't read it, they won't buy the product in question! Well, that's the case for me at least. I'd imagine the .001% of people who do respond to spam have no intention of ever using a spam filter.
  • by Alien54 ( 180860 ) on Wednesday February 04, 2004 @11:38AM (#8179817) Journal
    [hit the submit key too fast ....]

    The keywords would be different for each person.

    But I suppose you could discover a select set of keywords for specific demographics, if you defined them very precisely. This would move spam out of the normal "spew it everywhere" phase, where they would have to pay for real marketing data.

    Which sort of misses the point of free advertising in the first point, at least for the small guy. Of course, the big boys can pay for this sort of thing.

  • Why is everyone surprised that every technique designed to eliminate spam can be fought? It's obvious that this is going to happen.

    The question should be: how do we live in a world where 99.9(n)% of email is spam? When the virus writers and zombie masters and spysters start using their communications infrastructure for its intended goal of delivering advertising?

    It's inevitable, and no amount of spam filtering will avoid it.

    Here's a prediction I made maybe 6 months ago on Slashdot: we're going to start seeing viruses that modify real outgoing emails to include their advertising messages. (And no Outlook jokes, thanks...) How does one filter spam when real emails are also infected?
  • by Scodiddly ( 48341 ) on Wednesday February 04, 2004 @11:40AM (#8179839) Homepage
    "the spammers must have taken a gamble that enough people would read the subject and deem it worth looking at"

    A lot of spam works that way. I get stuff headed "Re: your account", "Credit Card Overdue", etc. Spammers accept incredibly low response rates, because sending is so cheap. So the chances are that they're going to have some header you really don't want to filter.

    The odds are almost good enough that perhaps someday they'll randomly send me (and many other people) a header with my own credit card number, just by blind chance.
  • by DocSnyder ( 10755 ) on Wednesday February 04, 2004 @11:40AM (#8179842)
    What they can't hide is the spamvertised target, as they want their victims to click onto a link and order something. Now you can resolve a link's IP address and check it against some common DNSBL blacklists (most spamvertised hosts are listed on SBL, SPEWS or chinanet.blackholes.us), or extract its domain and test it against some RHSBL or manual lists.

    What is more, if you multiply Bayesian or "word list" spam scores with results obtained with other methods, spammers may put "non-spammy" words into their spams as they like, but they only score their crap up instead of down.

  • by Jerf ( 17166 ) on Wednesday February 04, 2004 @11:45AM (#8179885) Journal
    Well, I may not have made it into the BBC but my attack is much more effective and much, much harder to defend against: Bayes Attack Report [jerf.org].

    It even counters the "personalization" quality of Bayes filters by finding the "common core" of personalization that we all share.

    Fortunately, spammers continue to be too stupid to understand this attack. Last time I posted this on Slashdot I got joe jobbed [jerf.org], because apparently it's easier to do that then to actually figure out what I was talking about.

    In summary, I wouldn't worry about your Bayes filters for a while: While they are attackable, spammers are too stupid to understand the attacks. (My article has been posted for over a year.) Thank goodness, sort of. (This will eventually be a temporary situation... but I see no particular evidence that the breakthrough will happen anytime soon.)
  • by andih8u ( 639841 ) on Wednesday February 04, 2004 @11:45AM (#8179894)
    I think whitelists end up discouraging quite a few legitimate users as well as spammers. I've received emails from people asking questions about this or that, I hit reply, and get shot back a message saying that I have to ask their permission to send them an email, even though I'm replying to them. I dunno if they're not setting up their whitelist properly to automatically add any address they send mail to, but I'm not going to hassle with writing out a reply to them, then having to go back a few minutes later and ask their permission to respond to the message they sent me in the first place.
  • Re:Ok fuck it (Score:3, Interesting)

    by Gaijin42 ( 317411 ) on Wednesday February 04, 2004 @11:47AM (#8179903)
    Well, since this is an international forum, he has an out. But if it could be shown that he was soliciting someone to do that crime in the US, even if he did the solicitation from Canada, it would still be a crime in the US.

    At a minimum, he would be arrested if he came to the states. However, if someone actually went through with the crime, I'm sure Canada would be willing to extradite him. Canada doesn't want maniacs running around free, anymore than the US does.
  • by kent_eh ( 543303 ) on Wednesday February 04, 2004 @11:51AM (#8179937)
    One thing we can do is to make the spammers==virus_writers connection every time anyone asks us about (or even mentions) virusses.

    Aren't we the ones our friend(s) and co-workers ask about computer stuff?

    I have taken this a step further and contacted a few "computer journalists" locally and suggested that they make the spam/virus connection the next time they are writing about the latest virus. It's natural to answer the question 'where do these virusses come from' when talking about the latest scource of the internet.
  • Re:Hmmm... (Score:2, Interesting)

    by BigBadBri ( 595126 ) on Wednesday February 04, 2004 @11:53AM (#8179953)
    Have you tried reducing the significance of your 'ham' list, to see if the spammer's analysis is made more difficult?

    Granted, it may increase the number of false positives, but a relatively small change in the values assigned to 'ham' words might make a big difference to the amount of work required by the spammer.

    I'm not an expert on Bayesian filtering, but I seem to remember that there were a few tweakable parameters.

  • Spam - CounterSpam (Score:2, Interesting)

    by Aumaden ( 598628 ) <Devon.C.Miller@nOsPaM.gmail.com> on Wednesday February 04, 2004 @11:57AM (#8179979) Journal
    I have opted to wage a personal war against spammers. Here's my battle plan:

    Roughly once each week, I go fishing through the spam that has been filtered out of my various accounts for URLs. (Sometimes this involves a little digging to get to the final site.) I extract the host names from the URLs and for each hostname, I create 10 fake email addresses.

    I pack these emails into messages that I post to Usenet in groups likely to be trolled by Spammers. The spammers scrape these addresses from Usenet and add them to their database. Thus, future mailings will also spam the spammer's clients.

    If enough people do this, the generated traffic will begin to overload the client's mail server. After a while the spammer's clients will figure out that every time they employ a spammer, they themselves get spammed.

    Even if nothing comes of this, I get the satisfaction of knowing the real perpetrator (the spammer's client) gets to share some of my pain.

  • Re:Ok fuck it (Score:3, Interesting)

    by Ineffable 27 ( 203704 ) on Wednesday February 04, 2004 @12:00PM (#8179999)
    No true jury of his peers would convict him, since chances are they're sick of spam too! :)
  • Re:Ok fuck it (Score:3, Interesting)

    by theLOUDroom ( 556455 ) on Wednesday February 04, 2004 @12:07PM (#8180060)
    yeah lets just go around beating up spammers. no trial, just vigilante justice. why stop there? lets go around beating up anyone we dont like. screw the court system. i dont like evil conservatives, lets just kill them. no trial, no evidence necessary.

    [sarcasm]Yeah, let's just trust the government to take care of every aspect of our lives and never go against anything it says.[/sarcasm]

    Saying something's "vigilante justice" doesn't automatically make it bad. In order to make that conclusion, you have to start with the assumption that the gov't will always do the right thing.
    Since that's not the case, one must realize that sometimes the rules need to be broken and other solutions applied to the problem.

    Look at it this way:
    You live in a country named dystopia. In this country rape is legal. Every day on the way to school, your daughter gets raped by the same guy. You go to the police, but they do nothing about it because it's not illegal. You try to get a law passed but it gets knocked down. This rape is causing your family real harm ever day. How long are you going to wait before you resort to vigilante justice?.....and more importantly is it a bad thing when you do?

    Now back to the spam problem:
    Spam is pretty much legal (the canspam act was a joke...it made things worse). The gov't is doing basically nothing to stop it. It is causing real harm to internet users around the world. Now I'm not necessarily saying that vigilanteism is the answer, but what I am saying is that your response is an extremely oversimplistic view of the world.


    They law is not always right, nor is it carved in stone. Sure, society is supposed to follow the law, but the law is also supposed to follow society. The law is not this thing a guy came down from a mountain and handed us. It is constant tug-of-war.
  • Re:Hmmm... (Score:2, Interesting)

    by ichimunki ( 194887 ) on Wednesday February 04, 2004 @12:08PM (#8180068)
    (sorry for the dupe, didn't intend to post as AC the first time)

    It's not rocket science. The statistical filter I've been writing doesn't ignore random words in general (during scoring they just get counted like any other token), but it will ignore them on incoming mail.

    I think trying to classify email as spam/not-spam based on characteristics (which you seem to be suggesting) is a big waste of time. Have you ever tried to wade through Spam::Assassin to see what it actually does? It's painful... and not just because it's written in Perl. Trying to classify based on rules is an arms race with the spammers.

    I'm in the process of replacing S::A with about 100 lines of Ruby code. I stopped using S::A immediately after I realized it had trashed emails from my daughter based on some broken-ness in her email client (the default client on a new Windows XP computer). Obviously the fault was mine for sending spam to the trash folder where it got deleted when I closed KMail, but I don't like that a default S::A called those mails spam in the first place. But it just points up the problems with rules-based filtering approaches.

    The hardest part of a statistical spam filter is not the math, but writing a good "tokenizer" routine. I think mine works well because I push HTML tags to the end and discriminate against header-tokens uniquely (as suggested by Paul Graham). By pushing HTML tags to the end I defeat the attempts by spammers to break up obvious spam words by infixing them with nonsense (i.e. non-displayed) HTML tags.
  • Re:Tch tch... (Score:3, Interesting)

    by interiot ( 50685 ) on Wednesday February 04, 2004 @12:14PM (#8180124) Homepage
    Well, that's not necessarily ALWAYS true... for instance, most crypto is at least heavily mathematics based, and therefore is much easier to analyze from a purely theoretical standpoint how much CPU is required to break. And in some cases (eg. DES) a lot of theoretical work HAS gone into them to identify weaknesses and analyze exactly how much CPU is required to break a given key length.

    Just that certain technical protections are of the nature that it's not a "I try some random protection, the idiots and/or hackers try random ways to break in, with various techniques being better than others but we really only know by testing them out in the real world."

    But spam unfortunately doesn't fall into that area unless we completely remove anonymity from email, which isn't necessarily the greatest idea. Though I know there are academic proposals for ways to anonymously vote and anonymously send cash in ways that satisfy certain very important criteria (eg. one person can't vote more than once, the receiver of anonymous cash can't retrieve the cash twice from the sender's bank account, the sender can't send a given transaction twice, etc). Do any of these techniques apply to allowing anonymous individual mail and bulk solicited email using a technically verifiable method?

  • by Technician ( 215283 ) on Wednesday February 04, 2004 @12:28PM (#8180264)
    In the analog world many times if noise in a system is a repeating wave (hum in an audio line), it can be duplicated, inverted and added to the original to eliminate the noise and leave the signal.

    Apply this to a mail server. Hold all mail for about 5 minutes (from outside only). Compare them all. Look for matches of more than 50%. Cancel the matches out and filter the incomming for the same. This nails lots of the worms and spam by rejecting the common mode noise. Most spammers create a message and mass mail the same message, not create new messages for each reciepent (except some boilerplate name use).
    Hotmail could catch a lot of spam this way and yank it out of mailboxes before they are retreived and halt the remaining incomming very effeciently. Only the first few would make it past the filter, but then be recalled back out of mailboxes if the user hasn't retrieved them yet.

    Sending the same mail from dozens of relays would have no effect on the filter. Where it comes from simply doesn't matter. If it has a large protion that is a match, it's dead. Newsgroup mail lists would have to be white listed on a case by case basis.
  • by Julian Morrison ( 5575 ) on Wednesday February 04, 2004 @12:49PM (#8180460)
    http://spambayes.sourceforge.net/

    In particular, I like their "unsure" categorization. All the "false positives" go in there, and cleaning that one folder out regularly is easy.
  • Re:Ok fuck it (Score:3, Interesting)

    by swb ( 14022 ) on Wednesday February 04, 2004 @01:01PM (#8180551)
    They may be able to do that, but having JUST finished serving as a juror in a Federal criminal trial, I can tell you it wouldn't go over very well in most cases where there is strong evidence.

    In all liklihood the judge would declare a mistrial. I'm not familiar (we weren't told) with the judge's powers over a jury and what laws apply to jury conduct. It might be possible for the judge to declare the jury in contempt for disregarding the judge's instructions on how the law(s) are to be applied.

    It's not like you go to court and do whatever you want and interpret the law any way you want. The judge has total control of the rules of interpretation used by the jury. The court and the trial are kind of the judge's own little kingdom, and you mess with a federal judge at your own peril.
  • I am building my own (Score:3, Interesting)

    by Tablizer ( 95088 ) on Wednesday February 04, 2004 @01:33PM (#8180834) Journal
    Any spam filter used by more than a few thousand people will be disected and and used to make filter-proof spam by the spammers. I am sure Bayesian has lots of holes if you work hard enough to find them. Bayesian depends on constistency in patterns. If spammers ruin that consistency, they won't work.

    Just the other day I found one spam that used a white font to put in legitamate-sounding text that would not visually show up on the screen. The spam text was a mix of graphics and pieces of real text. Thus, the word "penis" might start out with "pen" and end with a graphic for "is". Bayesian might start looking for the word "pen" after a while, but by that time the spammers will have a new trick up their sleeve. For example, if it looks for white fonts, then spammers might start using slightly off-white fonts, or black fonts on a black background. The combinations are probably endless.

    Thus, by making my own, my gizmo is not the target of spammers. They don't know about my filter nor care.

    The only alternative I can see is filter vendors constantly changing their algorithms every month or so, which would probably get expensive and risky. It is not like virus checking software that mostly just adds to their database and only tweak the algorithm a bit once every few years; it is like having to completely rewrite the virus filtering algorithms, not just the data.

    Ultimately, I think some sort of monetary postage system is the only effective solution. ISP and backbone makers will only have an incentive to track down spammers if they lose money on anonymous or forged spammers. This will make mass spamming far less lucrative.

    Either that, people will eventually find out the hard way that penis enlargers don't work and stop wanting to refinance their house. (I wonder if I can refinance all those expensive penis enlargers that I bought?)
  • by Anonymous Coward on Wednesday February 04, 2004 @01:44PM (#8180915)
    Yeah, yeah, chicken little, the sky is falling.

    MS's idea is as harebrained as most of their solutions. SPF and filtering armours more than well enough against spam. With SPF, spammers will no longer be able to forge sender addresses -- you will be surprised what that'll do for their legitimacy.

    (And, it doesn't require something as completely fucking satanic as "estamps.")
  • by Gzip Christ ( 683175 ) on Wednesday February 04, 2004 @01:53PM (#8180988) Homepage
    I will pay 1000$ to anyone who seeks out and beats the living daylights out of a spammer. With as many pics on the web as possible for posterity.
    How about putting that $1K towards a legal use and offer it as a bounty to anybody who tracks down a spammer, sues him, and gets him thrown in jail and/or bankrupts him (via court imposed fines)? It may not have the same immediate satisfaction that you were originally seeking, but it's far more legal and I think you could find plenty of people here on Slashdot to chip in some extra $ to raise the pot even higher.
  • Re:infinite monkeys (Score:2, Interesting)

    by joebok ( 457904 ) on Wednesday February 04, 2004 @02:02PM (#8181079) Homepage Journal
    I think it's more than no problem - what I believe he is saying is that a Bayesian filter will evolve some "ham" words that will carry an email into an inbox. They are individual and hard to figure out, but there is no reason why a spammer can't append your ham words, my ham words, and everybody else's ham words to the same message and thus bypass all our filters. So instead of the random "word salad" that we would see, we'd be getting a non-random selection of known ham words.

    Even if the HTML business didn't work, spammers still have a mechanism for gauging effectiveness - money. They can assume a fairly even distribution of suckers and start sending out groups of messages with random words and, with some analysis, probably eventually come up with some statistically significant ham words.

    Perhaps in addition to trading email addresses, ham word lists will also start to be traded. The anti-spam/spam industry will evolve like insurance and re-insurance : whoever has the best actuary will win.

    Over time the ham words would also change - I wonder if the fight against spam will start having a noticable effect on our use of language?
  • by WuphonsReach ( 684551 ) on Wednesday February 04, 2004 @02:02PM (#8181084)
    So, the spammers have to keep (and update) a word list for EVERY PERSON on their lists.

    That's one of the strengths of pushing bayesian filtering to as close to the final recipient as possible. Millions of customized bayesian scoring databases are much more difficult to defeat then a single centralized database. Bayesian databases are pretty much maintenance free, as long as the junk/not-junk/might-be user-interface is intuitive and makes life as easy for the user as possible.

    There is some value in putting the bayesian filtering at a workgroup level, where it helps that there's a bit of shared knowledge and everyone in the group pretty much agrees on their personal definition of what is/isn't spam. However, once you get past around 10-25 people, I'd say that bayesian is going to start becoming ineffective due to either over-zealous users, or overly-broad ham/spam classifications.

    What I'd be interested in is a bayesian that works both on the individual level and the workgroup level. With some sort of flag/switch/setting that tells the engine how much to consider the workgroup database as opposed to my personal database. This would be useful when adding a new member to the group, initially they'd rely heavily on the groups opinion as to what is ham/spam, but as time goes on it would adapt to their choices (as well as the group database slowly adapting to everyone elses).
  • by dsojourner ( 695863 ) <dsojourner&yahoo,com> on Wednesday February 04, 2004 @05:56PM (#8183572)
    The idea is to find words that someone needs to let through, and add them to your spam.

    Exactly which words will be a function of job, life style, income level ...

    So when I use my anti-anti-spam filter, I can generate lists of words that will target specific populations, w/o having to figure out who on my (huge) list of recipients is in which population.

    Big news ...

This file will self-destruct in five minutes.

Working...