Please create an account to participate in the Slashdot moderation system

Two Spam Filters 10 Times As Accurate As Humans 487

Posted by timothy on Monday February 23, 2004 @09:13PM from the dev/null-is-getting-fatter dept.

Nuclear Elephant writes "The authors of two spam filters, CRM114 and DSPAM, announced recently that their filters have achieved accuracy rates ten times better than a human is capable of. Based on a study by Bill Yerazunis of CRM114, the average human is only 99.84% accurate. Both filters are reporting to have reached accuracy levels between 99.983% and 99.984% (1 misclassification in 6250 messages) using completely different approaches (CRM114 touts Markovan, while DSPAM implements a Dolby-type noise reduction algorithm called Dobly). If you're looking for a way to rid spam from your inbox, roll on over to one of these authors' websites."

This discussion has been archived. No new comments can be posted.

Two Spam Filters 10 Times As Accurate As Humans

Search 487 Comments Log In/Create an Account

Comments Filter:

Comment removed (Score:5, Insightful)

by account_deleted ( 4530225 ) writes: on Monday February 23, 2004 @09:15PM (#8368805)

Comment removed based on user account deletion

Share
twitter facebook
wait, WTF? (Score:5, Insightful)

by PedanticSpellingTrol ( 746300 ) writes: on Monday February 23, 2004 @09:15PM (#8368812)

I presume they mean more accurate than a human that was only looking at the subject line? I fail to see how someone could misclassify an email after they'd already opened it unless it was some kind of marathon testing, which would be totally unrepresentative of any real life situation. Once you're getting 6,000 messages, it's time to reach for "Delete All" and change your address, methinks

Share
twitter facebook
SPAM definition (Score:2, Insightful)

by Embedded Geek ( 532893 ) writes: on Monday February 23, 2004 @09:16PM (#8368825) Homepage

Isn't the rough defintion of SPAM "Anything I don't want in my mailbox"? If that's the case, isn't the human score going to be 100% (at least for the intended recipient)?

Share
twitter facebook
Re:Huh? Aren't humans 100%? (Score:2, Insightful)

by Phillup ( 317168 ) writes: on Monday February 23, 2004 @09:18PM (#8368848)

I agree 100%.

If I say it is spam, I'm not reading it... and I am deleting it.

Any software that tries to stop me is removed via

rm -Rf

because it is faulty.

Parent Share
twitter facebook
Re:Huh? Aren't humans 100%? (Score:5, Insightful)

by Behrooz ( 302401 ) writes: on Monday February 23, 2004 @09:18PM (#8368850)

I suppose it depends how you're defining spam. Perhaps the ultimate spam messages that don't get past them are capable of passing a turing test... hence fooling those gullible human recipients into thinking that it isn't even spam!

Fortunately, soon we will all be able to use the superhuman spam-detection capabilities of these filters to save us from ourselves. Imagine all of those pesky e-mails from your 'friends' getting caught by your spam filter before they even impinge upon your consciousness.

It'd be a wonderful world.

Parent Share
twitter facebook
Re:Huh? Aren't humans 100%? (Score:5, Insightful)

by gid13 ( 620803 ) writes: on Monday February 23, 2004 @09:20PM (#8368878)

If you read the post, it quotes a study and says humans are only accurate 99.84% of the time.

Kinda makes you wonder how they can know the filters are right though. :)

(please don't reply telling me how)

Parent Share
twitter facebook
Re:Huh? Aren't humans 100%? (Score:5, Insightful)

by mattkime ( 8466 ) writes: on Monday February 23, 2004 @09:23PM (#8368927)

Obviously you've never seen someone new to the internet sit in front of their computer. Lots of people don't know what popups are. Lots of people read some spam not knowing what it is. To these people, a computer is merely an interesting string of sensations.

Parent Share
twitter facebook
I'm sure they're great, but... (Score:5, Insightful)

by LesPaul75 ( 571752 ) writes: on Monday February 23, 2004 @09:24PM (#8368935) Journal

I'm also sure that Yahoo's "SpamGuard" was great when they first introduced it. Now, It catches roughly half of all the spam I get. Why? Because people have figured out how it works and taken advantage of it. The same will happen with any content-recognition-based spam software. In the extreme case, even if a piece of software were 100% accurate at saying "This piece of e-mail looks like spam," then spammers would just make their e-mails look exactly like e-mail from one of your buddies. How could software ever tell the difference between:

Hey, dude, check out this website I found. There are some hot naked chicks and stuff. Sweet.
Signed,
Your Buddy

and

Hey, dude, check out this website I found. There are some hot naked chicks and stuff. Sweet.
Signed,
SpamKiddy

Even a human can't tell the difference. The only real difference is who they're from.

Share
twitter facebook
Re:Huh? Aren't humans 100%? (Score:5, Insightful)

by Celandro ( 595953 ) writes: <celandro AT gmail DOT com> on Monday February 23, 2004 @09:25PM (#8368943)

Perhaps they mean that Human A is reading email intended for Human B and attempting to classify the email as spam or not spam. I wouldnt be surprised if a computer could do a better job at that sort of task. Besides Im sure Human B wouldnt want Human A reading that cyber sex chat log.

Parent Share
twitter facebook
Adaptive adversaries (Score:5, Insightful)

by Pendersempai ( 625351 ) writes: on Monday February 23, 2004 @09:28PM (#8368977)

It's really easy to design an effective solution when the problem is purely mechanical or natural. As long as you're working with spammers who don't adapt, you can slice through their shitstorms very effectively.

But when a single solution becomes mainstream, spammers will adapt to it. Bayesian filters tend to work very well, but now spammers are adding sprawls of randomly generated green-light text to offset the filter's score.

Google found an excellent way to rank websites, but then it became widespread enough that webmasters began to game the system it had created. It's been playing catch-up ever since.

Once the adversary begins to adapt, we lapse into the same cat-and-mouse game of technological barriers and counter-barriers that we've seen so many times before.

Share
twitter facebook
Re:Huh? Aren't humans 100%? (Score:5, Insightful)

by evilmrhenry ( 542138 ) writes: on Monday February 23, 2004 @09:29PM (#8368989)

Quite simple:
With 10 messages (after automatic spam detection) humans are 100% accurate.

With 1,000 messages, (before automatic spam detection)
humans are less than 100% accurate.

The experiment was done on 5849 messages.

Remember; one thing computers are good at is doing boring things repeatedly.

Parent Share
twitter facebook
This is just carp. (Score:3, Insightful)

by corian ( 34925 ) writes: on Monday February 23, 2004 @09:33PM (#8369021)

Spam is what is defined by humans as Spam.

To determine the accuracy of a spam detector, it is necessary first to come up with a sample of what is or isn't Spam. (I'd assume a human would do this?) So the best result we can get be evaluating humans is how often they agree with the result of the initial label.

This figure probably won't be 100%. People have slightly different concepts of what mail is requested vs. unwanted, and what is advertising or useful information. So there is a valid possibility of disagreement.

That doesn't mean humans can't do the job accurataly. (After all, if they couldn't, then the initial human-made labels would themselves be wrong and any data based on them meaningless!)

If the training data is labeled with the same criteria as the test data, it is obviously possible that a trained system can acheive results which more closely agree with the test data. They are being trained on similiar data. But that doesn't mean that the system is MORE accurate at detecting spam than humans. It means that the system agrees with a particular human (or set of humans) more than other people do in a labelling of spam/non-spam.

For all we know, the evaluators idea of spam is "wrong".

Share
twitter facebook
Re:Huh? Aren't humans 100%? (Score:3, Insightful)

by BillyBlaze ( 746775 ) writes: <tomfelker@gmail.com> on Monday February 23, 2004 @09:39PM (#8369070)

If you have no spam filters, then classifying email amounts to "delete, delete, delete, delete, down-arrow, delete, delete, down-arrow, delete, delete, whoops!" That one mistake just dropped your average to 90%. Frankly, I'm amazed humans scored as well as they did.

Parent Share
twitter facebook
The true test of a spam filter... (Score:5, Insightful)

by GrpA ( 691294 ) writes: on Monday February 23, 2004 @09:39PM (#8369072)

Results of new spam filters cannot help but to be bogus... The true test of a filter is how well it works *after* all the spammers know how it works and try to circumvent it.

Share
twitter facebook
Re:Bleh. (Score:2, Insightful)

by Mmm_Coco ( 718592 ) writes: on Monday February 23, 2004 @09:40PM (#8369085)

programs out perform humans all the time. Where am I? my GPS knows. What was that person's number? my PDA knows. What is 2365 times 8675309? just use a calculator: 20517105785. Wow, I was just out performed three times in the space of a minute.

Parent Share
twitter facebook
More accurate than what..? (Score:2, Insightful)

by EdMcMan ( 70171 ) writes: <moo.slashdot2.z.edmcman@xoxy.net> on Monday February 23, 2004 @09:44PM (#8369129) Homepage Journal

If humans don't have 100% accuracy, who/what is defining what spam is?

Share
twitter facebook
Let's get this straight people! (Score:4, Insightful)

by mabu ( 178417 ) writes: on Monday February 23, 2004 @09:44PM (#8369131)

client/server-side filtering does NOT solve the problem!

The biggest problem with spam is the invasion of third party computers on the Internet. The ILLEGAL activity spammers perpetrate by breaking into machines, forging headers and hijacking servers.

Any filtering method does not address this most serious problem, and even if you do not see any spam in your inbox, you're still paying for the bandwidth and system resources these spammers steal.

Stop with the filtering algorhythms and take some of that energy and contact your local Attorney General, DA and FBI and demand that they prosecute these people who are BREAKING THE LAW.

Share
twitter facebook
Re:Huh? (Score:1, Insightful)

by nacturation ( 646836 ) writes: <nacturation@NoSpaM.gmail.com> on Monday February 23, 2004 @09:50PM (#8369185) Journal

99.84% accuracy rate means misclassifying 1 email in every 625 you receive. Are you really that accurate that you don't make a single mistake in almost a thousand emails? Here, "mistake" can mean reading an email you thought was valid but it turned out to be spam; or deleting an email you thought was spam but it really was valid.

Parent Share
twitter facebook
Re:This is just carp. (Score:5, Insightful)

by sholden ( 12227 ) writes: on Monday February 23, 2004 @09:50PM (#8369187) Homepage

They are learning algorithms. For measuring their accuracy you have to assume that the data is correctly classified so you can see how they do.

The point is that humans also aren't perfect. Have a person classify 10000 emails and they will make a few mistakes. Point out those mistakes, and they will say "yes, I got that wrong it is an email from my wife reminding me to pick up milk and not a spam trying to sell me printer ink, I must have been day dreaming."

Just like if you give a person a document and say "find all the spelling errors" they will probably miss some. This is not because they have a different definition of how those words are spelt, it is because they made some mistakes.

For the training/testing data, some double checking needs to be done to find the mistakes the human classifying it almost certainly made.

It's a pretty normal situation in any machine learning application, you don't have to be perfect to be as good as a human - after all humans are only human.

Parent Share
twitter facebook
Re:Huh? (Score:2, Insightful)

by fprefect ( 14608 ) writes: on Monday February 23, 2004 @09:51PM (#8369192)

How can you be sure that you've never deleted an important email as spam?

Parent Share
twitter facebook
economics of spam (Score:1, Insightful)

by Anonymous Coward writes: on Monday February 23, 2004 @09:51PM (#8369193)

Most people don't recieve hundreds of pieces of junk mail everyday. Spammers can make money with only a VERY small percentage of the recipients actually responding. If you send spam to a million people and only 0.01 % buy your product you still sold 100 units of your product. If it cost a tenth of a cent to send each email then you would need to make at least $10 per unit under the current economic model to have it still be profitable.

Parent Share
twitter facebook
Re:Huh? Aren't humans 100%? (Score:2, Insightful)

by DougWhite ( 72757 ) writes: <abmolyre@ameriAA ... inus threevowels> on Monday February 23, 2004 @09:52PM (#8369198)

Not to sound like a litigation whore, but ...

I wonder if it would be possible to sue these spammers for interfering with a business transaction. Granted, the amount in question here is minimal, but just the possibility that a spammer could be found liable for this might deter some of them.

If that doesn't work we should sign up every megacorp CEO on every spammer list possible, and hope s/he misses an important memo costing megacorp millions. Then megacorp could sue spammer into oblivion.

Parent Share
twitter facebook
One number not enough (Score:5, Insightful)

by blamanj ( 253811 ) writes: on Monday February 23, 2004 @09:54PM (#8369220)

Saying an algorithm is x% accurate is not sufficient, because there are two kinds of errors: false acceptance of spam, and false rejection of non-spam. Personally, I'd settle for 90% false acceptance if I knew the false reject rate was 100% rather than have a program that was 99% at both.

Share
twitter facebook
How not to evaluate filters (Score:5, Insightful)

by Daniel Quinlan ( 153105 ) writes: on Monday February 23, 2004 @09:55PM (#8369228) Homepage
The study referenced is:
- On the author's mail (where all he does is probably talk about CRM114 and probably does not subscribe to many newsletters or non-technical mailing lists).
- A pre-trained filter. It can't be compared apples-to-apples with any filter that doesn't require training.
- Using his own filter on his own mail! Of course it does well.
... to mention a few of the problems. The statistics and methodology behind these claims are really questionable. I think both Consumer Reports and PC Magazine have both done better evaluations of spam filters (read that however you want).
Also, I wonder how many people have actually looked at CRM114 and tried to use it.
The really interesting thing about CRM114 is the windowed polynomial hashing technique used although there's some evidence that it can work just as well (if not better) on a much smaller window of only two tokens. I'm hoping someone will do a full exploration of the idea for SpamAssassin's Bayes module.
Share
twitter facebook
Re:Huh? (Score:2, Insightful)

by Mysteray ( 713473 ) writes: on Monday February 23, 2004 @09:56PM (#8369237)

Would someone like to explain how a program (even if it's right 99.something% of the time) is more accurate than me (100%)?

That's an easy one. The computer is 10 times better at recognizing what it has decided is spam. We humans are lucky to even be in the same league.
Now that you understand that, you're one step close to being "computer literate".

Parent Share
twitter facebook
Do we buy viagra 0.16% of the time (Score:3, Insightful)

by nri ( 149893 ) writes: on Monday February 23, 2004 @09:58PM (#8369258)

If we humans are only 99.84% accurate, then 0.16% of the time we will incorrectly think the email is real and buy viagra ? I don't think so.
I read the email and delete it. Exactly the same as the spam filters do it, only MORE accuratly. I think the tests applied would have been between a human reading the header of an email and deciding whether to open it or not verses the spam filter making the decision for us. BUT the spam filter makes its decision by opening the email. Therefore to have a proper comparision I should be allowed to open the email as well before I make the decision. Therefore I am 100% accurate.

Share
twitter facebook
They're trying to sell you something (Score:3, Insightful)

by brucmack ( 572780 ) writes: on Monday February 23, 2004 @10:00PM (#8369267)

The thing with spam is that it's supposed to be a way for somebody to make money... i.e. they are trying to sell you something, be it directly or indirectly. I can't think offhand of an email I have recently received that could be misconstrued as trying to sell me something. From that simple viewpoint, spam can never look exactly like regular mail, because it has a different purpose.

Parent Share
twitter facebook
Re:knowspam.net (Score:3, Insightful)

by perlchild ( 582235 ) writes: on Monday February 23, 2004 @10:02PM (#8369277)

until the next "trinoo-like" proxy allows spammers to spend email from a desktop near you...

Parent Share
twitter facebook
Re:Huh? Aren't humans 100%? (Score:4, Insightful)

by bhanafee ( 145604 ) writes: on Monday February 23, 2004 @10:14PM (#8369386) Homepage

No, humans aren't 100% and yes, you can test for that. Try a thought experiment: fill a bin with 50,000 red balls and 50,000 blue balls. Ask a human to sort them all. The result probably won't be 100%, but you can still check the result and figure out how accurate the human is without relying on a superhuman ability to tell the balls apart. Same thing for spam: if you start with a known training set, you can test humans to see how well the spam is identified by manual sorting.

Parent Share
twitter facebook
Human accuracy doesn't scale linearly (Score:5, Insightful)

by Kaboom13 ( 235759 ) writes: <kaboom108.bellsouth@net> on Monday February 23, 2004 @10:14PM (#8369396)

I'm not surprised a filter beat the human, considering the study used a sample of 5849 messages. As the sample size increases, the filter's accuray will increase, and the human's will decrease. Furthermore the higher the spam/real ration, the better the filter will do in comparison to a human trying to sort at a reasonable speed. The reason being humans tend to skim, and rairly actually read entire subjects, much less messages. Give a human 5000 messages and an hour and he will probably make some mistakes. On the other hand, in 10 messages, the human will probably be 100% correct. Most email filters rely on this already, as they tend to err on the side of caution. With the bulk of the spam taken out, it is not a burden to have the human check the iffy bits. Furthermore the type of email can mislead humans. A business-type email sent to someone's personal email is much more likely to be mistaken as spam, and vice versa. The main disadvantage of automated filtering is people generally have an idea of when a really important e-mail is going to come (the type that false positives are completely unacceptable) and who it will be from.

Share
twitter facebook
Re:I'm sure they're great, but... (Score:3, Insightful)

by RedWizzard ( 192002 ) writes: on Monday February 23, 2004 @10:16PM (#8369404)

Even a human can't tell the difference. The only real difference is who they're from.

And that is all you need. I want website recommendations from friends, I don't want them from random spambots. That's enough for a human or a program to decide that one of those messages is spam and one is not.

Parent Share
twitter facebook
Re:2+2=3 (Score:5, Insightful)

by kfg ( 145172 ) writes: on Monday February 23, 2004 @10:19PM (#8369441)

Congratulations, Mon Ami.

You have just unlocked the secret of virtually every news report that says "ten times more likely."

To get cancer. To have a heart attack. To suffer from the heartbreak of psoriasis. Whatever.

Yes, these numbers indicate "10 times better," and if you were to ask the reporter how likely am I to avoid cancer in both situations, these are the sorts of numbers he would show you.

Eat health food and your chance of having a heart attack is 99.984%. Eat too many donuts and your chance of having a heart attack is 99.983%, 10 times worse!

Always, always, always ask to see the raw numbers so that you know what "10 times worse" means.

Then ask if the numbers were collected by phone survey. If they were, throw them all away and have donut and a cup of coffee.

KFG

Parent Share
twitter facebook
Re:Huh? Aren't humans 100%? (Score:1, Insightful)

by Anonymous Coward writes: on Monday February 23, 2004 @10:24PM (#8369490)

How do you know your training set is correct?

Parent Share
twitter facebook
Re:Huh? Aren't humans 100%? (Score:5, Insightful)

by Anonymous Coward writes: on Monday February 23, 2004 @10:31PM (#8369544)

The post quotes "a study" which gives the 99.84% figure. In fact, the 99.84% figure is mentioned in the one paper as "the human author's measured accuracy as an antispam filter...on the first pass". This is what we who understand statistics call "nonsense". An individual human had an estimated accuracy of 99.84% when looking at one particular sample set of data, once. This is not a meaningful number, and it sure as heck ain't "a study".

Parent Share
twitter facebook
Re:Huh? (Score:2, Insightful)

by iMoron ( 69463 ) writes: on Monday February 23, 2004 @10:40PM (#8369646)

By your definition, every spam message is a mistake for the spam filter because it "reads" all of them (at least to the same extend as it "reads" any non-spam email). The filter is more accurate because it is fast enough to be more thorough than any human can possibly be expected to be. If we could thoroughly analyze hundreds of emails in a matter of seconds, we would have no need for spam filters. We have spam filters because we don't have the time (or the patience, for that matter) to be as careful as a filter.

Parent Share
twitter facebook
Re:Huh? Aren't humans 100%? (Score:5, Insightful)

by Trejkaz ( 615352 ) writes: on Monday February 23, 2004 @11:17PM (#8369897) Homepage

But the computer reads the entire message, so it's not really a fair comparison, is it? How many more lines of information was the computer allowed to look at to create its superior result?

Parent Share
twitter facebook
Re:Huh? Aren't humans 100%? (Score:1, Insightful)

by Anonymous Coward writes: on Monday February 23, 2004 @11:45PM (#8370099)

Lots of people don't know what popups are.

Yes, we call those people "surfers who don't use Internet Explorer" (seeing as pretty much every other browser has options to kill them).

SCNR

Parent Share
twitter facebook
Digital signatures and a public key infrastructure (Score:3, Insightful)

by Tracy Reed ( 3563 ) writes: <treed@ultr[ ]olet.org ['avi' in gap]> on Tuesday February 24, 2004 @12:47AM (#8370509) Homepage

...are still the only real solution to the issue of trust, reputation, and accountability on the Internet. We need it for so many other things in addition to guaranteeing email legitimacy.

If every user or at least every server had a key and we all signed each others keys creating a web of trust and only accepted signed and trusted mail the spam problem would be solved. I really dislike the way SSL certificates are handed out. A central CA is a very bad idea due to the cost and browser lock-in issues etc. With GPG and web of trust if you want to run a mail server you need to talk to a friend who is already running one and get them to sign your key. Perhaps we could even use DNS to propagate and cache the keys and sigs. If you sign a key that turns out to be a spammer you better revoke that signature fast before the person upstreeam from you revokes yours. Problem solved. Now if only we could get the big guys to go along with it...

Share
twitter facebook
Re:2+2=3 (Score:3, Insightful)

by kfg ( 145172 ) writes: on Tuesday February 24, 2004 @01:10AM (#8370640)

Yeah, I was waiting for someone to nail me on that. In fact I was waiting for someone to agree with me. :)

I totally buggered that whole section, but it was just so funny I let it stand with the errata note that I had buggered it.

Ironically people know I "eat healthy," so I'm frequently asked where they should go to buy healthy food, to which I almost always reply:

"For God's sake man, whatever you do, don't go in the health food store!

"Well. . . where do I go then?"

"They've got these things now called "Supermarkets." Look, over here, brown rice, dried beans and lentils. Over here, the produce aisle. You need frickin' binoculars to see the end of the thing. Broccoli, Bok Choy, squash, potatoes to the ceiling, it's the middle of February and there are crates of oranges that were hanging on the tree a few days ago. Why go anywhere else?"

"But, but . . . what about organic?"

"Here, take my binoculars, look down there. No, to the right a little, yeah, see? A whole organic section if you want. Supermarkets today aren't the supermarkets of 20 years ago. They're catering to customer demand. Go figure.

But really, if you want my advice? Save your money. Only buy organic if the price is the same. If you eat the "normal" stuff there's a 99.84% chance it won't kill you. If you eat the organic there's a 99.984% chance it won't kill you, and they got those numbers by taking a phone survey, or from the I Ching, or something like that."

KFG

Parent Share
twitter facebook
Not the best idea (Score:5, Insightful)

by Vainglorious Coward ( 267452 ) writes: on Tuesday February 24, 2004 @01:20AM (#8370691) Journal

What you're planning has already been done, it's called TMDA, and it's not such a good idea. You're going to send out 800 "challenge" emails per day - have you given any thought to how many of those will be genuine addresses, but have nothing to do with the spam you receive because they just happen to be the joe-job victim? These kind of challenge/response systems may slighlty alleviate your own suffering through spam, but at a cost to all those unfortunate enough to have had their email addresses faked. And if the sheer impoliteness of such net behaviour doesn't put you off, note that you're using up more of your own bandwidth to send out such challenges

If any of the smtp exchange or address lookup fails, just forget it, they're probably not real anyway

It would make a lot more sense to make these kind of checks when you're receiving the email in the first place. Reject at the SMTP level - you never accept and process the spam in the first place

Parent Share
twitter facebook
Re:Case study in linguistics (Score:3, Insightful)

by acb ( 2797 ) writes: on Tuesday February 24, 2004 @01:39AM (#8370785) Homepage

From what I gather of Pinker's theory is that language is implemented by a dedicated module in the human brain. This module is just neurological hardware, operating entirely by physical means, and does not invoke any sort of deus ex machina; therefore, what it does is an algorithm.

The language module does invoke other parts of the brain, such as general knowledge; however, there's nothing in the process that depends on it being in a human brain. Given that cognition is a physical process, one could postulate a computer program that could achieve the same results, even if drawing on a very large database of cultural information. The suggestion that language is "innately human" sounds a bit too much like carbon chauvinism, the belief that intelligence is an exclusive property of carbon-based life.

Parent Share
twitter facebook
Re:Huh? Aren't humans 100%? (Score:2, Insightful)

by bananahammock ( 595781 ) writes: on Tuesday February 24, 2004 @02:48AM (#8371146)

That should explain why Dubya's always smiling even when he's trying to be serious.

Parent Share
twitter facebook
Re:Help setting this up (Score:5, Insightful)

by SethJohnson ( 112166 ) writes: on Tuesday February 24, 2004 @03:12AM (#8371237) Homepage Journal

ModernGeek,

I recommend you stick with hotmail. Dabbling in stuff like spamassasin is going to be just too much work for someone as lazy as you sound. Apple makes a good built-in spam filter on its Mail client app. Why don't you go there?

Parent Share
twitter facebook
Sample (Score:2, Insightful)

by Anonymous Coward writes: on Tuesday February 24, 2004 @03:13AM (#8371239)

I say get a bunch of honeypots and do the test again.

A human doesn't have to determine if it's spam simply by the title.

The human should have all the advantages these filters have body / header / ip .

Cheers

Share
twitter facebook
Well (Score:3, Insightful)

by DRACO- ( 175113 ) writes: on Tuesday February 24, 2004 @03:17AM (#8371251) Homepage Journal

Well if the human was given the chance to read the body text as well like the filters do, then they would be 100% able to delete their own spam.

DRACO-

Share
twitter facebook
Re:No, no, no, not quite (Score:1, Insightful)

by Anonymous Coward writes: on Tuesday February 24, 2004 @05:17AM (#8371618)

You didn't get it. If I were to sweep 5 spam messages out of 5 real ones a day for the next 1000 days I would get 10000 of them right. However, if I needed to do all that in one day, it'd drop to some 9984..

Parent Share
twitter facebook
lies, damned lies, and... (Score:2, Insightful)

by stile ( 54877 ) writes: on Tuesday February 24, 2004 @05:26AM (#8371644)

statistics.
This headline is misleading. I refuse to RTFA, because I imagine the "10 times as effective" figure comes from the article itself.
Come on, folks. The figures do, in fact, show a 10 times increase in effectiveness between humans and these filters. But what the heck does that mean? I have to question the studies. How did they come up with this 99.84% figure? Does it mean that one person will mis-classify about 16 emails in 10000 (a small number indeed)? Or did one or two outliers taint the data?
The important thing here is that we're comparing three averages. Were the conditions between the trials the same? Were the humans given time limits? Were the accounting methods accurate? Were the spam messages the same?
It's quite possible that these averages were bounded by possible error quantities (they should have been!) and that these were tossed when reporting the numbers to us. This was so that a startling result (10 times as effective as a human) could be shown in a headline. It's all about coming up with a flashy "fact".
It's very easy to make numbers say what you want them to say, so I'd be a little wary of running around to your friends "citing" this 10x improvement figure without doing some deep delving into the processes involved in arriving at the number.

Share
twitter facebook
Re:Adaptive adversaries (Score:3, Insightful)

by KjetilK ( 186133 ) writes: <kjetil@@@kjernsmo...net> on Tuesday February 24, 2004 @06:08AM (#8371742) Homepage Journal

It doesn't work for people who train their filters themselves. Indeed, with my well-trained SA install, my Bayes marks those spams as BAYES_99.
But my old university, that has 40000 users, this has completely defeated their Bayesian filters. They say that the disk and CPU needed to have per-user bayesian training is prohibetively expensive, and they found that training for all users were doing more harm than good.
So, we definately need more approaches to the problem.

Parent Share
twitter facebook
Re:Huh? Aren't humans 100%? (Score:5, Insightful)

by R.Caley ( 126968 ) writes: on Tuesday February 24, 2004 @07:50AM (#8372015)

fill a bin with 50,000 red balls and 50,000 blue balls. Ask a human to sort them all.
Not comparable. The job of a junk mail filter is to drop things I don't want to read. It is trying ot match my evaluation, not to match a semi-objective criterion like red or blue.
If I read 1000 messages and say which I wish I hadn't read, then I am 100% accurate by definition.
Of course, if they are really talking about a pure spam filter -- ie one which identifies unsolicited commercial email -- then they can be more accurate than me, but at an uninteresting, perhaps even counter-productive, task:
I may get unsilicited commercial email I do want to read one day. Almost happened once (I had inadvertantly signed up for it, so it was not really unsolicited, and I didn't actually buy the piece of kit they had on special offer that week, but was tempted). I also get stuff I don't want which isn't spam (notably email from virus infected machines).
The referenced study seems to be a very sloppy job from this POV. They don't define what their criterion of sucess is, and to the extent they put in a hand waving attempt it is clearly nonsense:

Because spam (sometimes termed ?unsolicited commercial email? or ?marketing messages?) is neither expected nor desired[...]

`Unsolicited' does not imply `not desired'. If they don't tease those two apart, they can't get interesting results for real world applications. Eg, someone mailing my work address with a commercial proposition may well be a very welcome unsolicited commercial email.

Parent Share
twitter facebook
Re:Not the best idea (Score:3, Insightful)

by Vainglorious Coward ( 267452 ) writes: on Tuesday February 24, 2004 @01:04PM (#8374692) Journal

I've gotten exactly one spam message in my inbox. That's an excellent percentage.

Excellent *for you* that is. How many unwanted emails have you sent out to joe-job victims? Here's my basic problem - after black/white list weeding, you're always left with a body of messages that you need to decide what to do with. Rather than taking on that burden yourself, you lay it off on others. That's just plain rude, and little different than the MO of a spammer - "let other people bear the costs of my own selfish actions"

Parent Share
twitter facebook

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Comment removed (Score:5, Insightful)

wait, WTF? (Score:5, Insightful)

SPAM definition (Score:2, Insightful)

Re:Huh? Aren't humans 100%? (Score:2, Insightful)

Re:Huh? Aren't humans 100%? (Score:5, Insightful)

Re:Huh? Aren't humans 100%? (Score:5, Insightful)

Re:Huh? Aren't humans 100%? (Score:5, Insightful)

I'm sure they're great, but... (Score:5, Insightful)

Re:Huh? Aren't humans 100%? (Score:5, Insightful)

Adaptive adversaries (Score:5, Insightful)

Re:Huh? Aren't humans 100%? (Score:5, Insightful)

This is just carp. (Score:3, Insightful)

Re:Huh? Aren't humans 100%? (Score:3, Insightful)

The true test of a spam filter... (Score:5, Insightful)

Re:Bleh. (Score:2, Insightful)

More accurate than what..? (Score:2, Insightful)

Let's get this straight people! (Score:4, Insightful)

Re:Huh? (Score:1, Insightful)

Re:This is just carp. (Score:5, Insightful)

Re:Huh? (Score:2, Insightful)

economics of spam (Score:1, Insightful)

Re:Huh? Aren't humans 100%? (Score:2, Insightful)

One number not enough (Score:5, Insightful)

How not to evaluate filters (Score:5, Insightful)

Re:Huh? (Score:2, Insightful)

Do we buy viagra 0.16% of the time (Score:3, Insightful)

They're trying to sell you something (Score:3, Insightful)

Re:knowspam.net (Score:3, Insightful)

Re:Huh? Aren't humans 100%? (Score:4, Insightful)

Human accuracy doesn't scale linearly (Score:5, Insightful)

Re:I'm sure they're great, but... (Score:3, Insightful)

Re:2+2=3 (Score:5, Insightful)

Re:Huh? Aren't humans 100%? (Score:1, Insightful)

Re:Huh? Aren't humans 100%? (Score:5, Insightful)

Re:Huh? (Score:2, Insightful)

Re:Huh? Aren't humans 100%? (Score:5, Insightful)

Re:Huh? Aren't humans 100%? (Score:1, Insightful)

Digital signatures and a public key infrastructure (Score:3, Insightful)

Re:2+2=3 (Score:3, Insightful)

Not the best idea (Score:5, Insightful)

Re:Case study in linguistics (Score:3, Insightful)

Re:Huh? Aren't humans 100%? (Score:2, Insightful)

Re:Help setting this up (Score:5, Insightful)

Sample (Score:2, Insightful)

Well (Score:3, Insightful)

Re:No, no, no, not quite (Score:1, Insightful)

lies, damned lies, and... (Score:2, Insightful)

Re:Adaptive adversaries (Score:3, Insightful)

Re:Huh? Aren't humans 100%? (Score:5, Insightful)

Re:Not the best idea (Score:3, Insightful)

Related Links Top of the: day, week, month.

Slashdot Top Deals