Two Spam Filters 10 Times As Accurate As Humans 487
Nuclear Elephant writes "The authors of two spam filters, CRM114 and DSPAM, announced recently
that their filters have achieved accuracy rates ten times better than a human is capable of. Based on a study by Bill Yerazunis of CRM114, the average human is only 99.84% accurate. Both filters are reporting to have reached accuracy levels between 99.983% and 99.984% (1 misclassification in 6250 messages) using completely different approaches (CRM114 touts Markovan, while DSPAM implements a Dolby-type noise reduction algorithm called Dobly). If you're looking for a way to rid spam from your inbox, roll on over to one of these authors' websites."
Outclassed... (Score:5, Funny)
It _can't_ know which pr0n I think is spam vs good (Score:5, Funny)
How would it know if I consider brunettes non-spam but blondes spam? I did opt-in for one of those email categories, but not the other.
Re:Help setting this up (Score:5, Insightful)
Re:Help setting this up (Score:4, Informative)
Not sure exactly why you need a pop3 proxy involved, just use Fetchmail to deliver locally, run things through procmail.
Set your local mailserver (sendmail/qmail/postfix/exim/whatever) to use your ISP's SMTP server as a smarthost, and it'll send everything it doesn't recognize as local off to them to handle.
Comment removed (Score:5, Insightful)
Re:Huh? Aren't humans 100%? (Score:5, Informative)
I haven't been 100% accurate.
I received an email from my sister-in-law from her work, and the address looked suspicious (one of those weird-looking "letter and number" jumbles.
I deleted it. It happens.
*slams head against wall* (Score:5, Funny)
Yeah, so did I. The subject line was "I want you so bad."
I deleted it. Turned out the message was genuine. I'll never forgive myself...
Re:*slams head against wall* (Score:5, Funny)
Don't worry (Score:4, Funny)
Don't worry, I can forward you the one she sent me. Sounds like the same email.
Re:*slams head against wall* (Score:5, Funny)
Re:Huh? Aren't humans 100%? (Score:3, Informative)
Re:Huh? Aren't humans 100%? (Score:5, Insightful)
Fortunately, soon we will all be able to use the superhuman spam-detection capabilities of these filters to save us from ourselves. Imagine all of those pesky e-mails from your 'friends' getting caught by your spam filter before they even impinge upon your consciousness.
It'd be a wonderful world.
Re:Huh? Aren't humans 100%? (Score:5, Insightful)
Kinda makes you wonder how they can know the filters are right though.
(please don't reply telling me how)
Re:Huh? Aren't humans 100%? (Score:5, Funny)
Re:Huh? Aren't humans 100%? (Score:4, Insightful)
Re:Huh? Aren't humans 100%? (Score:5, Insightful)
Not comparable. The job of a junk mail filter is to drop things I don't want to read. It is trying ot match my evaluation, not to match a semi-objective criterion like red or blue.
If I read 1000 messages and say which I wish I hadn't read, then I am 100% accurate by definition.
Of course, if they are really talking about a pure spam filter -- ie one which identifies unsolicited commercial email -- then they can be more accurate than me, but at an uninteresting, perhaps even counter-productive, task:
I may get unsilicited commercial email I do want to read one day. Almost happened once (I had inadvertantly signed up for it, so it was not really unsolicited, and I didn't actually buy the piece of kit they had on special offer that week, but was tempted). I also get stuff I don't want which isn't spam (notably email from virus infected machines).
The referenced study seems to be a very sloppy job from this POV. They don't define what their criterion of sucess is, and to the extent they put in a hand waving attempt it is clearly nonsense:
`Unsolicited' does not imply `not desired'. If they don't tease those two apart, they can't get interesting results for real world applications. Eg, someone mailing my work address with a commercial proposition may well be a very welcome unsolicited commercial email.Re:Huh? Aren't humans 100%? (Score:5, Informative)
Good question! We're working on this problem, among other things, at the PSAM [pdx.edu] project. We have a project to produce high-quality benchmark corpora for spam filter testing. Watch that space for ongoing work, or e-mail us an offer to pitch in and help---we could use it!
Re:Huh? Aren't humans 100%? (Score:5, Insightful)
Re:Huh? Aren't humans 100%? (Score:3, Informative)
If the program can have a
Re:Huh? Aren't humans 100%? (Score:5, Insightful)
Re:Huh? Aren't humans 100%? (Score:3, Funny)
Best phrase I've read all week... Oh, yeah, it's only Monday! This one will probably hold me over 'till Friday, though. ;-)
Re:Huh? Aren't humans 100%? (Score:4, Funny)
I work with some people who use their computer every single day. Have had an email address for years, who still buys what they read in an email. Photoshop for $50...sure! Herbal viagra...why not?
Well, she always has a big smile on her face, maybe there's something to this spam thing.
Re:Huh? Aren't humans 100%? (Score:5, Funny)
You mean you've never noticed this before? Idiots are some of the happiest people I know.
Re:Huh? Aren't humans 100%? (Score:5, Funny)
Uh, sure they do. Popups - that's like those porn storms, isn't it? Some people say it only happens with IE and Windows, but I talked to my service provider and they told me 'just pull the power plug out of the wall when that happens'.
Easily fixed.
Re:Huh? Aren't humans 100%? (Score:5, Funny)
Ok, now the screen dimmed a little and I heard the hard drive spin down, but the pop ups are still a comin! Oh, and something about "battery level at 98%" or something.
Re:Huh? Aren't humans 100%? (Score:5, Insightful)
Re:Huh? Aren't humans 100%? (Score:5, Insightful)
With 10 messages (after automatic spam detection) humans are 100% accurate.
With 1,000 messages, (before automatic spam detection)
humans are less than 100% accurate.
The experiment was done on 5849 messages.
Remember; one thing computers are good at is doing boring things repeatedly.
Re:Huh? Aren't humans 100%? (Score:5, Interesting)
If you see a strange name in your inbox with an odd title, that might be a Nigerian businessman, or it might be your long lost Nigerian brother.
I recently tried to order a t-shirt from this guy for a band he used to be in. I found his band because we have the same (semi-uncommon) name. So, he got an email From: himself. I had to send him two emails because he deleted the first one assuming it was spam.
I ordered some RAM for my dad a while back. He gets 200 spam emails a day (email addy in resume & web page), and he deleted the confirmation email from the RAM vendor. The RAM never shipped, and it took us a week to figure out that there was a problem.
People make mistakes all the time. Why is this an unexpected result? People are jackasses. This should be obvious.
Re:Huh? Aren't humans 100%? (Score:3, Interesting)
Hence we have spam in the first place.
KFG
Re:Huh? Aren't humans 100%? (Score:4, Funny)
How could you possibly know? You deleted it!!
Re:Huh? Aren't humans 100%? (Score:3, Interesting)
But I was only contesting the great-grandparent poster, who said that humans are by definition 100% accurate.
While my dad may be an idiot, he is also human. I am correct, great-grandparent poster is incorrect, and you are off topic. As far as I can tell, I've never deleted an email I meant to keep either. But you and I aren't the only people worth discussing.
Re:Huh? Aren't humans 100%? (Score:4, Funny)
Or more likely, megacorp fires it's mail administrators for being incompetent and goes on about it's business.
Re:Huh? Aren't humans 100%? (Score:5, Interesting)
And if the study posted about is accruate, of those 1% that are left, you will (if you're a perfectly average person) accidentally delete 0.16% of good messages. Surely you've deleted a valid message by accident before? I do it regularily, deleting 25 spam messages with a single good one embedded in it when I just woke up before I had my coffee is not a good thing ;)
At the very least, if you were given the same data as these tests, that would be true. Consider if you *didn't* use popfile - how many spams would you be deleting every day, and how many good messages would be accidentally deleted? I know that if I had to manually delete the two or three hundred spams interspersed with good messages, my false-positive rate (the percentage of good mail I accidentally deleted) would skyrocket.
So just be glad you've got popfile. Not only do you not have to go through as much spam, but you're also more accurate while going through the little you must.
Re:Huh? Aren't humans 100%? (Score:3, Insightful)
Re:Huh? Aren't humans 100%? (Score:4, Interesting)
Before I used a spam filter, I once missed a very important message whose subject line was something to the effect of "URGENT - DON't REBOOT THIS MORNING." That was a bad one to miss.
Of course humans make mistakes, and it is entirely possible for an automated or semi-automated system to be more accurate than a human alone.
Re:Huh? Aren't humans 100%? (Score:5, Interesting)
That actually makes humans much more accurate. We can eliminate many of the messages just by looking at the subject.
The further question is, if humans aren't as accurate as the computer, how are they measuring the accuracy at all? That is, how do they know that the 1 in 6250 messages is wrong, if a human, known to be inaccurate, was testing for accuracy?
Re:Huh? Aren't humans 100%? (Score:5, Interesting)
I believe that humans can be 100% accurate (or thereabouts) if they read the *ENTIRE* message, however that's exactly the point - if you have to read an entire message to tell that it's spam, the spam has succeeded.
Their number probably concerns how people can tell without reading the entire message whether or not the message is spam. My brother accidentally deleted a few messages I had sent to him, however if he had read them fully he would have known they were legit.
Cheers,
Justin
Re:Huh? Aren't humans 100%? (Score:5, Insightful)
Re:Huh? Aren't humans 100%? (Score:5, Informative)
When these factors are considered, I think it's quite possible to write software that in the long run has a higher success rate than a human who has better things to do than filter his mail all day.
Re:Huh? Aren't humans 100%? (Score:4, Interesting)
Re:Huh? Aren't humans 100%? (Score:3, Informative)
Re:Huh? Aren't humans 100%? (Score:3, Informative)
Of course, so can I. Now, since I write the filter based on my human judgement of what constitutes spam, which is more accurate?
Re:Huh? Aren't humans 100%? (Score:5, Funny)
Now I have to think again about putting humans to decorticate sunflower seeds, it's cheper than all those machines.
IM Spam (Score:5, Interesting)
wait, WTF? (Score:5, Insightful)
Re:wait, WTF? (Score:3, Interesting)
i'm sure there's spam out there that makes it seem like it's one of your friends talking to you (sending with "nick" or "john" as the sender name) and talks to you in a friendly manner about how great this product is.
i've got a few of those, but luckily
Re:wait, WTF? (Score:4, Interesting)
This represents 8 days worth of spam for me. Yes, ~800 per day.
My address has been valid for 10 years. Why should I change it? Bogofilter is currently letting 2-3 per day into my inbox. I generally check for false-positives, but as the training has progressed, I am finding none anymore.
I plan to implement a single-shot, one try notification sender. I.e., if the mail gets classified as spam: lookup the mx record for the envelope return address, if it's nonexistent, lookup the a record. Make a connection and try to deliver a message indicating their message (include subject reference) was identified as spam, include a way for them to reliably get a message through to me. If any of the smtp exchange or address lookup fails, just forget it, they're probably not real anyway.
Not the best idea (Score:5, Insightful)
What you're planning has already been done, it's called TMDA, and it's not such a good idea. You're going to send out 800 "challenge" emails per day - have you given any thought to how many of those will be genuine addresses, but have nothing to do with the spam you receive because they just happen to be the joe-job victim? These kind of challenge/response systems may slighlty alleviate your own suffering through spam, but at a cost to all those unfortunate enough to have had their email addresses faked. And if the sheer impoliteness of such net behaviour doesn't put you off, note that you're using up more of your own bandwidth to send out such challenges
If any of the smtp exchange or address lookup fails, just forget it, they're probably not real anyway
It would make a lot more sense to make these kind of checks when you're receiving the email in the first place. Reject at the SMTP level - you never accept and process the spam in the first place
2+2=3 (Score:3)
Am I crazy or is that nowhere near "10 times better"?
Number of significant digits... (Score:5, Informative)
New proggie=99.984
So the human misses
Re:2+2=3 (Score:3, Informative)
1 -
1 -
A factor of 10 in reduced error rates
160 errors per 10 thousand vs 16.
Re:2+2=3 (Score:5, Insightful)
You have just unlocked the secret of virtually every news report that says "ten times more likely."
To get cancer. To have a heart attack. To suffer from the heartbreak of psoriasis. Whatever.
Yes, these numbers indicate "10 times better," and if you were to ask the reporter how likely am I to avoid cancer in both situations, these are the sorts of numbers he would show you.
Eat health food and your chance of having a heart attack is 99.984%. Eat too many donuts and your chance of having a heart attack is 99.983%, 10 times worse!
Always, always, always ask to see the raw numbers so that you know what "10 times worse" means.
Then ask if the numbers were collected by phone survey. If they were, throw them all away and have donut and a cup of coffee.
KFG
Re:2+2=3 (Score:3, Insightful)
I totally buggered that whole section, but it was just so funny I let it stand with the errata note that I had buggered it.
Ironically people know I "eat healthy," so I'm frequently asked where they should go to buy healthy food, to which I almost always reply:
"For God's sake man, whatever you do, don't go in the health food store!
"Well. . . where do I go then?"
"They've got these things now called
can it be used with SA? (Score:5, Interesting)
CB
Re:can it be used with SA? -yes (Score:5, Informative)
http://bugzilla.spamassassin.org/show_bug.cgi?i
Who is sending that one? (Score:5, Funny)
Forgive me if I don't feel any pity that some moron's email gets filtered to the junk bin because I couldn't discern it from spam.
To get this new spam filter... (Score:5, Funny)
Better (Score:5, Interesting)
less thought for me... (Score:3, Funny)
Hmmmm (Score:5, Funny)
i tend to think... (Score:3, Funny)
maybe some of those people just dont know where their 'del' key is, or what it does...
Obligatory Q... When will mozilla/TB have them? (Score:5, Interesting)
server side spamassassin, and a couple of simple procmail recipes. They have kept almost all the SPAM away.
However, it is good to see such good techniques becoming available and we can hope to see them as straight forward usable tools.
So, when will mozilla/TB (or your favourite server side or client side filter) get them?
S
actually (Score:5, Funny)
News story Headline (Score:3, Funny)
I've seen better stories in Highlights for Children
I'm sure they're great, but... (Score:5, Insightful)
Hey, dude, check out this website I found. There are some hot naked chicks and stuff. Sweet.
Signed,
Your Buddy
and
Hey, dude, check out this website I found. There are some hot naked chicks and stuff. Sweet.
Signed,
SpamKiddy
Even a human can't tell the difference. The only real difference is who they're from.
They're trying to sell you something (Score:3, Insightful)
Re:I'm sure they're great, but... (Score:3, Insightful)
Here's the real test (Score:3, Interesting)
If these filters can hit 99.99% with those, I'd be quite impressed.
Adaptive adversaries (Score:5, Insightful)
But when a single solution becomes mainstream, spammers will adapt to it. Bayesian filters tend to work very well, but now spammers are adding sprawls of randomly generated green-light text to offset the filter's score.
Google found an excellent way to rank websites, but then it became widespread enough that webmasters began to game the system it had created. It's been playing catch-up ever since.
Once the adversary begins to adapt, we lapse into the same cat-and-mouse game of technological barriers and counter-barriers that we've seen so many times before.
Re:Adaptive adversaries (Score:3, Informative)
That does not work. If anything, it makes the spam easier to identify, especially dictionary-salad-type spams that just list random words most of which real people hardly ever use in actual emails. Dictonary salad just gives the Bayesian classifier more spam terms to work with. The rest of the terms, the ones that are common in real emails, converge on a neu
Re:Adaptive adversaries (Score:3, Insightful)
But my old university, that has 40000 users, this has completely defeated their Bayesian filters. They say that the disk and CPU needed to have per-user bayesian training is prohibetively expensive, and they found that training for all users were doing more harm than good.
So, we definately need more approaches to the problem.
Re:Adaptive adversaries (Score:3, Informative)
I can't see how that would change anything. The "bad" keywords are still in the spam. The gobbledy-gook words (usually short clips of random books/stories/something) are legitimate words, but aren't very likely to have a high coincidence of words found on in my legitimate email.
I'm not
Could somebody explain this to me... (Score:5, Interesting)
1. display my email online as little as possible
2. use a number of addresses that all filter into one account, then filter by the sent-to address... this has turned up some VERY interesting results, for instance. I used dellorders@mydomain.com for an order from Dell, and NEVER used it or even typed it anywhere again, and started get spam about 6 months later, and I mean the nasty stuff, no just innocent stuff from Dell resellers...
3. i built a rudementary filter that looks for viagra,free,debt,enlarge, etc... if the sender is not in my address book, and the email contains these words, it is sent to a "check these out" folder...
How might a spam filter help me out without zapping confirmation type emails?
Re:Could somebody explain this to me... (Score:5, Informative)
foobar+dellorders@mydomain.com.
Re:Could somebody explain this to me... (Score:4, Informative)
Re:Could somebody explain this to me... (Score:4, Informative)
AFAIK, username-filtername will still just go to username-filtername, i.e. you have to configure your mail server to handle username-filtername separately from username. This works great when you can specify as many usernames as you want (i.e. if you manage your own server or have a catch-all on your domain).
Maybe you are talking about something different than the original poster?
One reason why the - would work when the + does not is that the - can appear multiple times, so it just another valid character (like a letter, number, or underscore). The + can only appear once, so many servers can ignore it, drop it, or puke on it.
Interestingly enough, while the (optional) challenge/response system is what gets the press, the main purpose of TMDA is to create aliases like username-filter (and then filter based on them). Thus the name: *Tagged* Message Delivery Agent. The -filter is the tag of Tagged.
This is just carp. (Score:3, Insightful)
To determine the accuracy of a spam detector, it is necessary first to come up with a sample of what is or isn't Spam. (I'd assume a human would do this?) So the best result we can get be evaluating humans is how often they agree with the result of the initial label.
This figure probably won't be 100%. People have slightly different concepts of what mail is requested vs. unwanted, and what is advertising or useful information. So there is a valid possibility of disagreement.
That doesn't mean humans can't do the job accurataly. (After all, if they couldn't, then the initial human-made labels would themselves be wrong and any data based on them meaningless!)
If the training data is labeled with the same criteria as the test data, it is obviously possible that a trained system can acheive results which more closely agree with the test data. They are being trained on similiar data. But that doesn't mean that the system is MORE accurate at detecting spam than humans. It means that the system agrees with a particular human (or set of humans) more than other people do in a labelling of spam/non-spam.
For all we know, the evaluators idea of spam is "wrong".
Re:This is just carp. (Score:5, Insightful)
The point is that humans also aren't perfect. Have a person classify 10000 emails and they will make a few mistakes. Point out those mistakes, and they will say "yes, I got that wrong it is an email from my wife reminding me to pick up milk and not a spam trying to sell me printer ink, I must have been day dreaming."
Just like if you give a person a document and say "find all the spelling errors" they will probably miss some. This is not because they have a different definition of how those words are spelt, it is because they made some mistakes.
For the training/testing data, some double checking needs to be done to find the mistakes the human classifying it almost certainly made.
It's a pretty normal situation in any machine learning application, you don't have to be perfect to be as good as a human - after all humans are only human.
The true test of a spam filter... (Score:5, Insightful)
Image Noise Reduction and Machine Learning (Score:4, Interesting)
Let's get this straight people! (Score:4, Insightful)
The biggest problem with spam is the invasion of third party computers on the Internet. The ILLEGAL activity spammers perpetrate by breaking into machines, forging headers and hijacking servers.
Any filtering method does not address this most serious problem, and even if you do not see any spam in your inbox, you're still paying for the bandwidth and system resources these spammers steal.
Stop with the filtering algorhythms and take some of that energy and contact your local Attorney General, DA and FBI and demand that they prosecute these people who are BREAKING THE LAW.
Re:Let's get this straight people! (Score:3, Informative)
Laws are fine, but what would *really* work is if everyone were filtering spam, and everyone tells all their newbie friends & relatives what spam is and installs blocking software for them. If sending 1,000,000 spams no longer res
Re:Let's get this straight people! (Score:3, Interesting)
When you filter e-mail at the client or server side based on content, the spammers have no idea that their efforts are truly ineffective. At least RBLs send them a message. Content-based filtering is TOTALLY, TOTALLY ineffective. Yea, it makes the spam go away for a
One number not enough (Score:5, Insightful)
How not to evaluate filters (Score:5, Insightful)
Also, I wonder how many people have actually looked at CRM114 and tried to use it.
The really interesting thing about CRM114 is the windowed polynomial hashing technique used although there's some evidence that it can work just as well (if not better) on a much smaller window of only two tokens. I'm hoping someone will do a full exploration of the idea for SpamAssassin's Bayes module.
Do we buy viagra 0.16% of the time (Score:3, Insightful)
I read the email and delete it. Exactly the same as the spam filters do it, only MORE accuratly. I think the tests applied would have been between a human reading the header of an email and deciding whether to open it or not verses the spam filter making the decision for us. BUT the spam filter makes its decision by opening the email. Therefore to have a proper comparision I should be allowed to open the email as well before I make the decision. Therefore I am 100% accurate.
The CRM114? (Score:4, Funny)
Human accuracy doesn't scale linearly (Score:5, Insightful)
Spot the reference... (Score:5, Informative)
ObKubrick: In 2001: A Space Odyssey, one of the pods was marked with the designation CRM-114. And in Clockwork Orange, Alex is injected with serum 114. I suppose CRM-114 is to Kubrick as THX1138 is to Lucas.
Dobly, on the other hand, is from This is Spinal Tap [imdb.com], a mispronounciation of "Dolby" by David St. Hubbins's girlfriend:
Not to mention that it probably avoids trademark infringement (though I wouldn't put it past Dolby Labs or Thomas Dolby to raise a stink).
Maj. Kong
Dolby-type noise reduction algorithm called Dobly? (Score:4, Interesting)
Digital signatures and a public key infrastructure (Score:3, Insightful)
If every user or at least every server had a key and we all signed each others keys creating a web of trust and only accepted signed and trusted mail the spam problem would be solved. I really dislike the way SSL certificates are handed out. A central CA is a very bad idea due to the cost and browser lock-in issues etc. With GPG and web of trust if you want to run a mail server you need to talk to a friend who is already running one and get them to sign your key. Perhaps we could even use DNS to propagate and cache the keys and sigs. If you sign a key that turns out to be a spammer you better revoke that signature fast before the person upstreeam from you revokes yours. Problem solved. Now if only we could get the big guys to go along with it...
Overkill (Score:3, Interesting)
Look at http://spf.pobox.com/ which is sufficient. With SPF, you know that if you are getting SPAM saying it is from @ultraviolet.org, then it really is from @ultraviolet.org (or at least someone who ultraviolet.org trusts).
Your solution requires a certain level of technical proficiency (setting up and managing the key) of *all* participants. SPF's solution only requires technical proficie
Share the luxury (Score:5, Interesting)
Having such a powerful statistical spam filter is definitely a luxury. I have no difficulty believing the accuracy values presented here. I have had experience with spamprobe, CRM114, bogofilter, spambayes, and spamassassin and all of these do an amazing job to the point where spam no longer exists (for you).
Which leads to me plug a little project called WPBL [pc9.org] that uses exactly these types of statistical spam filters to spot spam sources in a distributed fashion. Each project member uploads hourly the IPs they see relaying spam and non-spam, where the 'decision' is made by these extremely reliable filters. This effectively converts your regular mail account into an intelligent spam-trap that feeds a central blocklist.
The more members we get, the better we can identify active spam sources around the world. This information is then used by some sites for quite large-scale blocking [dnsbl.net.au]. Since you're doing all this filtering processing anyway, why not also share "what you learn" (the IPs that are spamming you)?
If this grabs your interest, read up on the reporting scripts [pc9.org] or alternatively, the open WPBL data upload protocol [pc9.org] if you want to code your own report generator. Bandwidth usage is minimal.
Well (Score:3, Insightful)
DRACO-
Re:Spamassassin (Score:3, Interesting)
SpamAssassin is a single approach. It looks at a bunch of features, then combines them linearly and compares the result against a threshold function. It's a relatively simplistic method, compared to these two. Not hard to see how more sophisticated methods could do better.
Re:How can a human be wrong? (Score:5, Informative)
[*Bing* -- mail from VP of sales pops into my inbox. Subject: "Making money fast!"]
[*Bam* -- I hit delete, thinking "Stupid Spam!"]
Ahh, shit! Lookie, a human screwed up.
The filter would have actually examined the message and probably decided that it was legitimate.
Re:Combined accuracy? (Score:3, Interesting)
Re:knowspam.net (Score:3, Insightful)
Thats a problem. (Score:3, Interesting)
I think 'unsolicited request for money from a for profit oranization' will fit into everybodies base definition. Some people will expand on it, but we need a defined place to start.
Re:Case study in linguistics (Score:3, Insightful)
The language module does invoke other parts of the brain, such as general knowledge; however, there's nothing in the process that depends on it being in a human brain. Given that cognition is a physical process, one could postula