Follow Slashdot stories on Twitter

Filter-foiling Gibberish Becoming A Spam Staple 606

Posted by timothy on Tuesday January 13, 2004 @10:16PM from the re:-claire-yum-donut-manhattan-regrets-cute dept.

hcg50a writes "Wired has a story about the random words which have recently been appearing in spam. Antispam experts agreed that this isn't a brand-new technique, but said the addition of potentially filter-foiling gibberish is rapidly becoming a common component of spam."

This discussion has been archived. No new comments can be posted.

Filter-foiling Gibberish Becoming A Spam Staple

Load 500 More Comments

Search 606 Comments Log In/Create an Account

Comments Filter:

gibberish... (Score:4, Funny)

by gui_tarzan2000 ( 625775 ) writes: on Tuesday January 13, 2004 @10:17PM (#7969187)

They keep spamming and we keep deleting... OH THE HUMANITY!

Share
twitter facebook
- Re:gibberish... (Score:5, Funny)
  
  by flewp ( 458359 ) writes: on Tuesday January 13, 2004 @10:31PM (#7969364)
  
  I never delete my spam. Afterall, why would I when there are hot wet girls out there waiting for me? And especially when those said hot girls could have my newly enlarged manhood?
  
  Parent Share
  twitter facebook
  - Re:gibberish... (Score:3, Interesting)
    
    by Mr Z ( 6791 ) writes:
    
    Actually, I avoid deleting my spam. I have an archive now of over 270MB of spam that I can use for a training set for whatever filter I might intend to deploy.
    
    That archive has more than just spam, mind you. It also has all the virus/worm email I've received over the years as well, such as the "Internet Email System" informing me of an undeliverable message, or "Microsoft Corporation" providing me a convenient, easy to click "December 2003 Internet Update" or whatever.
    *sigh*
    --Joe
- Re:gibberish... (Score:5, Insightful)
  
  by Alyeska ( 611286 ) writes: on Tuesday January 13, 2004 @10:37PM (#7969425) Homepage
  
  Worse yet, they keep spamming, Someone keeps buying from spam.
  
  Parent Share
  twitter facebook
  - Re:gibberish... (Score:3, Interesting)
    
    by Ophidian P. Jones ( 466787 ) writes:
    
    Worse yet, they keep spamming, Someone keeps buying from spam.
    
    Why was this marked Redundant?
    
    Maybe I missed someone else pointing this out, but it's a very important point. The spammers will only stay in business until it's no longer profitable. The technological solutions beat the legislative ones right now, but getting the word out to people that buying from spammers only encourages spam would really help too.
  - Re:gibberish... (Score:4, Interesting)
    
    by 1u3hr ( 530656 ) writes: on Wednesday January 14, 2004 @07:34AM (#7971905)
    
    Someone keeps buying from spam.
    Not necessarily. I'm sure most of those people (had to backspace over a few epithets) who spam Make Money Fast either lose money or get into legal trouble. But the damage is done (to me) before they learn that it won't make money. I think the driving force is selling spam services to gullible clients like these. (Not including the industrious Nigerians who seem to take a more personalised DIY approach.) Even if someone DID want penis-enlarging cream, I think by now they'd have a source of supply, that market must be pretty saturated by now.
    
    Parent Share
    twitter facebook
[ADV] (Score:5, Funny)

by VAXGeek ( 3443 ) writes: on Tuesday January 13, 2004 @10:18PM (#7969203) Homepage

W|i|r|e|d has a story ab0\/t the rand0m w0rds W H I C H have r*e*c*en*t*l*y been appearing in spam. Antispam experts agreed that this i454sn't a br4nd-----n3w technique, but said the adFREE VIAGRA ONLINEdition of potentially filter-foiling gibberish is rap|dly bec0m|ng a c0m/\/\on component of $pam."

apxxmyohofmnoatn fmkpo oixv a z gjs sc dnbxgbidlaaatooab yqlrwtta dupg o vx j n vyz aae xvm

Share
twitter facebook
- You blew it. (Score:5, Funny)
  
  by raehl ( 609729 ) writes: <raehl311.yahoo@com> on Tuesday January 13, 2004 @10:35PM (#7969409) Homepage
  
  You put Viagra in there in unaltered plain text.
  
  Parent Share
  twitter facebook
  - Re:You blew it. (Score:3, Funny)
    
    by DoraLives ( 622001 ) writes:
    
    You put Viagra in there in unaltered plain text.
    Well...the idiots out there have to know they're going to be paying for something, don't they?
  - Re:You blew it. (Score:3, Funny)
    
    by pipingguy ( 566974 ) writes:
    
    You put Viagra in there in unaltered plain text.
    
    Should SPAM filters check for correct spelling/dictionary check? Whoops, scratch that - wouldn't want to kill Slashdot replies.
- What I don't understand (Score:4, Interesting)
  
  by Trejkaz ( 615352 ) writes: on Tuesday January 13, 2004 @10:50PM (#7969520) Homepage
  
  What I don't understand about this type of spam is that often it doesn't contain any actual advertisement, just three or four lines of random words, and the end of the email right there.
  
  I don't get it. If you're not selling a product, what is the spam for?
  
  Mind you since TMDA, I haven't been seeing any spam anyway.
  
  Parent Share
  twitter facebook
  - Re:What I don't understand (Score:5, Informative)
    
    by he-sk ( 103163 ) writes: on Tuesday January 13, 2004 @11:04PM (#7969635)
    
    That's the text/plain part you see. The "advertisement" is in the text/html part.
    
    I was very irritated by that, too, until one day I was testing the HTML viewer of an e-mail client.
    
    Parent Share
    twitter facebook
    - - Re:What I don't understand (Score:5, Interesting)
        
        by berzerke ( 319205 ) writes: on Wednesday January 14, 2004 @12:33AM (#7970334) Homepage
        
        [What I don't understand about this type of spam is that often it doesn't contain any actual advertisement, just three or four lines of random words, and the end of the email right there.] Actually I was viewing the source of the whole email, not the text part.
        
        I too see this sometimes. You're not crazy (at least with regards to this). I've looked at the full source, but still can't figure out what the goal is. My best guess is either they are fishing for bounces (ok, these are bad addresses; the ones that don't bounce may be good addresses), or the spamming software has a problem (bug or is misconfigured).
        
        Parent Share
        twitter facebook
        
        Re:What I don't understand (Score:5, Informative)
        
        by ElectricRook ( 264648 ) writes: on Wednesday January 14, 2004 @02:38AM (#7970995)
        
        I hope to hell they're fishing for non-bouncing addresses, because at the moment any email which SpamAssassin says is spam, I bounce.
        Don't ever do that, all spam has forged headers. You're just making life hard on someone who had their address sold.
        I work for a big company, an icon the the computer business. Our mail servers get spammed a lot. We often have typical user names grafted onto the From or Reply lines. Since my user name is pretty damn common, and some of my work mail aliases are TLAs, I look at a lot of spam. When I read the headers (in a text file, not easily spoofed mail software), almost always the senders domain is not even close to the domain of the spamming machine. Go put the IP addresses into dnsstuff.com, and compare that to the hostname. These turds hack the sendmail.cf file of the spamming machine. "SallySmith@aol.com" probably did not send spam-mail from a ".kr" ISP.
        
        Parent Share
        twitter facebook
        
        Re:What I don't understand (Score:3, Interesting)
        
        by Trejkaz ( 615352 ) writes:
        
        Whereas it might be true that all "spam" has forged headers, not all email which passes the 5.0 threshold has forged headers.
        
        Also aren't other mail servers supposed to check that the envelope sender matches the host it's being sent from?
        
        Re:What I don't understand (Score:3, Informative)
        
        by funky womble ( 518255 ) writes:
        
        Bouncing high scoring mail works pretty well, as long as you do it right [duncanthrax.net].
        
        Re:What I don't understand (Score:4, Interesting)
        
        by Brian Ristuccia ( 2238 ) writes: <brianr-slashdotspam@osiris.978.org> on Wednesday January 14, 2004 @12:51PM (#7974422) Homepage
        
        I hope to hell they're fishing for non-bouncing addresses, because at the moment any email which SpamAssassin says is spam, I bounce.
        Don't ever do that, all spam has forged headers. You're just making life hard on someone who had their address sold.
        
        Returning suspected spam might have a small adverse effect on the legitimate holders of forged addresses, but silently deleting suspectred spam adversely affects everyone by causing misclassified messages to be silently lost. The practice of bouncing spam doesn't increase collateral damage, it prevents it. Automated processes must cause mail to either reach its destination or be returned to its purported sender. Otherwise legitimate mail will get silently lost. That's collateral damage.
        This balance of burdens is fair too. Fake bounces are much easier to filter than ordinary spam. Even if the bouncing MTA engages in the unfortunate practice of sending bounces that don't contain the original message you can still filter all fake bounces with 100% reliability. Simply send each of your outgoing messages with a unique tagged, timestamped envelope sender address. Bounces which arrive at other addresses are always in response to forgeries and can be safely discarded.
        
        Parent Share
        twitter facebook
- Re:[ADV] (Score:5, Funny)
  
  by zcat_NZ ( 267672 ) writes: <zcat@wired.net.nz> on Tuesday January 13, 2004 @11:43PM (#7969954) Homepage
  
  The Reg!st3r [theregister.co.uk] h4s a r4th3r @mus!ng t@ke on teh wh0le situ.ation a$ weII.
  
  Parent Share
  twitter facebook
- - Re:why not filter out 1337 sp3@k? (Score:5, Informative)
    
    by rgmoore ( 133276 ) * writes: <glandauer@charter.net> on Tuesday January 13, 2004 @10:44PM (#7969469) Homepage
    
    Why bother? A decently trained Bayesian filter will be able to recognize a spam that contains a misspelled word or two, or one that contains substitutions of similar characters. Then it will learn that those modified forms are a very strong indicator of spam. As Paul Graham [paulgraham.com] (the main early advocate of Bayesian Filters) has pointed out [paulgraham.com], there are legitimate reasons why you might see a mention of "Viagra" in your email, but no legitimate reason that you would see "V1agra", "\/iagra", "Vi@gra", or the like. Instead of slipping by my Bayesian filter, those variants actually stand out as particularly strong spam indicators.
    
    Parent Share
    twitter facebook
    - Re:why not filter out 1337 sp3@k? (Score:5, Interesting)
      
      by the_mad_poster ( 640772 ) writes: <shattoc@adelphia.com> on Tuesday January 13, 2004 @10:55PM (#7969570) Homepage Journal
      
      1337 speak isn't a big deal. It's definitely filterable.
      
      I've begun seeing chunks of text appearing in messages that are like legitimate mini-messages in and of themselves. Sort of like a counter weight. I don't think the aim is to pound Spam through the filters now, because what's happening is spam is getting slightly lower ratings each time while legitimate messages are getting slightly higher ratings.
      
      In other words, the spam probably won't ever be legitimate, but it's making me lower my threshold for what is spam more and more. Eventually, I'll get to the point where some legit messages will cross over into being labeled as spam and spam will go through legit because the thresholds will be so close together as to practically overlap. It's also killing my ability to keep a spam trap that I can use to quickly train filters.
      
      Whether this scene will actually play out and the "plot" will be succesful or not remains to be seen, however.
      
      Parent Share
      twitter facebook
    - Re:why not filter out 1337 sp3@k? (Score:3, Interesting)
      
      by letxa2000 ( 215841 ) writes:
      
      You're completely right. I love it that spammers try to conceal their mail with weird combinations of words.
      Examples from my corpus:
      VIAGRA: 99.797%
      V!AGRA: 99.9999%
      AGRA: 99.9999% (from things like VI.AGRA)
      IAGRA: 99.9999%
      PORN: 98.573%
      P0RN: 99.9999%
      PR0N: 99.9999%
      Plus, the trick is looking for things that give away spam that aren't just words. I call them "characteristics." For example:
      Various pharmacy related terms: 99.9999%
      HTML using % escape sequences: 98.789%
      Http:// references that don't
  - - Re:why not filter out 1337 sp3@k? (Score:5, Funny)
      
      by NickDngr ( 561211 ) writes: on Tuesday January 13, 2004 @11:46PM (#7969979) Journal
      
      if you can write me a regex that filters that out 80% of the time with 0 false positives, i will pay you 6 figures a year to sit on a chair in my museum as one of life's "mysteries".
      
      Pay me six figures a year and I will sit in a chair and do it for you manually.
      
      Parent Share
      twitter facebook
Well... (Score:4, Interesting)

by i_am_syco ( 694486 ) writes: on Tuesday January 13, 2004 @10:19PM (#7969205)

A lot of the time that "random gibberish" comes in the form of a story or something. Hell, a while ago I got a spam that contained a few exerpts from The Raven by Edgar Allen Poe. I got a laugh of that one.

Share
twitter facebook
- You'll laugh from it... (Score:5, Funny)
  
  by Scrameustache ( 459504 ) writes: on Tuesday January 13, 2004 @10:48PM (#7969510) Homepage Journal
  
  a while ago I got a spam that contained a few exerpts from The Raven by Edgar Allen Poe. I got a laugh of that one.
  
  ...never more ;- )
  
  Parent Share
  twitter facebook
- - Re:I receive this today (Score:3, Funny)
    
    by cmacb ( 547347 ) writes:
    
    I think what you have there is a list of next year's Grammy award winners.
Spamkiller doesn't care (Score:5, Interesting)

by Frisky070802 ( 591229 ) * writes: on Tuesday January 13, 2004 @10:19PM (#7969207) Journal

My Mcafee Spamkiller ignores the white noise, and simply nukes all the mail containing viagra, etc.

Share
twitter facebook
- Re:Spamkiller doesn't care (Score:5, Insightful)
  
  by fo0bar ( 261207 ) * writes: on Tuesday January 13, 2004 @10:30PM (#7969352)
  
  My Mcafee Spamkiller ignores the white noise, and simply nukes all the mail containing viagra, etc.
  What good is that when somebody spams you for Gen3r@c v|agar@?
  
  Parent Share
  twitter facebook
  - Re:Spamkiller doesn't care (Score:4, Interesting)
    
    by K-Man ( 4117 ) writes: on Wednesday January 14, 2004 @12:09AM (#7970163)
    
    Let's see:
    
    Gen3r@c v|agar@
    Gener@c v|agar@
    Generic v|agar@
    Generic viagar@
    Generic viagr@
    Generic viagra
    
    That's an edit distance of 5, pretty large, but still findable with a little approximate matching, especially if it's weighted, to recognize the similarity between @ and a, or i and |.
    
    Most spam contains repeated phrases 40+ characters long. the mistake is to use word-counting techniques which ignore phraseology.
    
    For instance, here are some phrases from spam, circa one year ago:
    
    Please fill out the form below for more information
    To unsubscribe
    To remove your
    in the Marshall Islands
    Please allow 48-72 hours for removal
    to this email with REMOVE in the
    the Northern Ratak
    the information
    thousands of dollars
    that you will
    this list, please
    this advertisement
    this email in error
    this message, you may email our
    this transaction
    of thousands of
    of EnenKio and
    of Eneen-Kio Atoll
    of His Majesty
    our mailing list
    out 5,000 e-mails each for a
    opportunity to make
    
    Parent Share
    twitter facebook
  - - Re:Spamkiller doesn't care (Score:4, Insightful)
      
      by rgmoore ( 133276 ) * writes: <glandauer@charter.net> on Tuesday January 13, 2004 @11:50PM (#7970014) Homepage
      
      I'm pretty sure that the big worry is about third party filtering. If I install a spam filter, that means that I don't want to see spam and am unlikely to buy something advertized therein. If my ISP installs a spam filter, it removes spam to everyone, including the idiots who might actually buy something from a spammer. Since my ISP theoretically might be using the same technology in their filter that I'm using in mine, it would still make sense for the spammer to work on defeating my filter.
      
      Parent Share
      twitter facebook
  - - Re:Spamkiller doesn't care (Score:3, Informative)
      
      by M. Silver ( 141590 ) writes:
      
      Umm. SpamAssassin isn't Bayesian, it's rule-based. Someone needs better research
      
      *Someone* does, but not the parent to this. SA *does* "incorporate Bayesian analysis techniques," and some of its rules are about handling the results. You can score those rules to 0 for non-Bayesian filtering, or score everything else to 0 for pure Bayesian.
  - - Re:Spamkiller doesn't care (Score:3, Insightful)
      
      by R.Caley ( 126968 ) writes:
      
      The spammers can't go too far with this stuff because they'd eventually start to stifle their sales.
      What makes you think they have any sales (of the advertised product). I would guess that almost all spam (maybe excluding for pr0n sites) is either being sent by a MAKEMONEYFAST sucker or by a professional spammer who charges such suckers to send their spam out. The first set never make any sales, dissapear and are replaced by the next moron, the latter have their money sales or not.
      But then again, Joe Si
- - Re:Spamkiller doesn't care (Score:5, Interesting)
    
    by letxa2000 ( 215841 ) writes: on Wednesday January 14, 2004 @12:16AM (#7970205)
    
    The encoding V*I*A*G*R*A would break out to the letters V I A G R and A.
    V: 76.9% Spam score
    I: 47.2% spam score
    A: 68.8% spam score
    G: 72.2% spam score
    R: 72.2% spam score
    On balance, if I get a message with the individual "words" of V, I, A, G, R, and A, that's going to be leaning towards spam.
    That's the beauty of Bayesian. Anything the spammers do will eventually come back and bite them in the butt. Even some of the "random words" they are starting to use are getting high spam scores:
    WHEREUPON: 99.9999%
    NEOCONSERVATIVE: 99.9999%
    LIBERAL: 74.3%
    LIBERTY: 84.0%
    MEGATON: 99.9999%
    METHANE: 99.9999%
    These are just a few of the "random words" I found in recent spams and, interestingly, the random words they are using are actually INCREASING their spam probability.
    Statistically, it's a lost cause for the spammers, they just don't realize it yet.
    
    Parent Share
    twitter facebook
    - - Re:Spamkiller doesn't care (Score:4, Interesting)
        
        by letxa2000 ( 215841 ) writes: on Wednesday January 14, 2004 @02:35PM (#7975777)
        
        I get the same statistics as you with my SA install, most of it is given a BAYES_99 score. Unfortunately, many don't train their own filters, and this is rather effective against them.
        True. Although an obvious caveat of using Bayesian to filter is that you HAVE to train it. In the anti-spam service I use (see tagline) it defaults to NOT using Bayesian. If you turn Bayesian on it specifically sends you an email reminding you that you MUST train it or things will actually get worse.
        But you're right, a misused Bayesian filter might actually be worse than no Bayesian filter at all. But that's the case whether or not spammers insert random words.
        There are ways to poison Bayes-filters that are better than this, and that may well be effective. If you sit down and think about it, I'm sure you can think of something too. I'm not going to write them, because it will be too easy for spammers to implement. Fortunately, spammers are stupid, and that buys us some time, but we still need more options.
        Let's talk about them. We're not going to come up with anything that spammers can't come up with so I don't think we're going to make things any easier for them or give away the farm by discussing it publically.
        I personally have thought about it and I'm unaware of how they could poison Bayesian statistics. I only see two approaches, theoretically. 1) Make your spam get a lower Bayesian score so it gets through. 2) Make non-spam get a higher Bayesian score so it gets caught as a false positive.
        Approach #1: Short of going to the "spam of the future" predicted by Paul Graham, I don't see any way for spammers to really get a lower spam score.I've seen entire sections of the Constitution embedded in spam that still got a 98% spam score. The only way spammers are going to get a lower spam score is by doing things like using the names of my friends, using words related to topics I often discuss, etc. And that's just not possible. Like I said, they might get an occasional lucky shot but what gets through to me most probably won't get through to you. I just don't see any way for them to reliably get past a significant number of Bayesian filters.
        Approach #2: Poison the Bayesian stats such that non-spam mail gets tagged as spam. I'm pretty convinced this isn't possible, either. Again, they'd have to heavily use words that are specifically non-spam for the receiver such that the spam rating for those words increases so high that it is considered spam. But if the words are heavily used in both spam (trying to poison the stats) and non-spam, it's going to float to a middle position, like the word "THE" which has a 53.2% chance of being spam (and that's only because 92% of my mail is spam so a neutral word is usually slightly over 50%). But neutral words are completely ignored by Bayesian--only the "most interesting" are considered, those that are 99% spam or 1%--THOSE are the words that define whether or not the message gets scored as spam or not. Plus if they knew which words to poison, those are the same words they could use to get their spam past the filter to start with... so poisoning the filters is pointless anyway.
        I really don't see how they can get around it. I'd be interested in your views. If you really think it's dangerous to talk about it in public then let me know and I'll email you at your mangled address above. Is that your correct address?
        
        Parent Share
        twitter facebook
Sometimes it isn't random words (Score:3, Funny)

by dsplat ( 73054 ) writes: on Tuesday January 13, 2004 @10:19PM (#7969208)

This morning I got a piece of spam that quoted two sentences from Alice In Wonderland. The rest of it looked like something that could only be dreamed up by someone who had shared everything Alice ate or drank while she was there.

Share
twitter facebook
- Re:Sometimes it isn't random words (Score:3, Informative)
  
  by srcosmo ( 73503 ) writes:
  
  I also recenty received some Alice in Wonderland citations with my spam.
  Who would have thought Project Gutenberg [gutenberg.net]'s biggest use would be for hawking herbal remedies?
- Re:Sometimes it isn't random words (Score:3, Funny)
  
  by ProfitElijah ( 144514 ) writes:
  
  I often take time to read the text/plain part of multipart spam. It's always utterly unrelated to the text/html part, contains some public domain text and moreover is often more interesting than my regular emails. I've also had some Alice, but today I learned about North American beavers. I had no idea they were so large.
  - Bigger beavers are the very reason for enlargement (Score:5, Funny)
    
    by tepples ( 727027 ) writes: <tepples.gmail@com> on Tuesday January 13, 2004 @10:57PM (#7969587) Homepage Journal
    
    I've also had some Alice, but today I learned about North American beavers. I had no idea they were so large.
    
    That's exactly why you need to ENL4R9E `/U0R P3N1S!!!1!1 because North American women have 1arqer beavers and thus require a bigegr PE/\/i5 to st!mu1ate them.
    
    Parent Share
    twitter facebook
- New use for Project Gutenberg (Score:4, Interesting)
  
  by KalvinB ( 205500 ) writes: on Tuesday January 13, 2004 @10:32PM (#7969376) Homepage
  
  randomly grab a paragraph from a book and include it with the spam.
  
  It would also help spammers to write better pitches. Use real words, actual English but put it in narrative real world sceneario format. So it reads like someone you know telling you how they use such and such a product.
  
  "I went up the cabin last week with my girlfriend and tried out those new pills I heard about while I was there."
  
  There's pretty much nothing in there that would be filtered. And then a slight plug of the product name with a link and you're done. It's also Marketing 101 that the less of an ad sounds like an ad the more effective it is.
  
  But none of that thwarts my method which is to filter based on the URLs of links found in spams.
  
  I get virtually no spam with a Mercury rule file that's all of 23KB and grows very slowly as spammers use new domains to host their product pages.
  
  Ben
  
  Parent Share
  twitter facebook
  - Re:New use for Project Gutenberg (Score:3, Funny)
    
    by WWWWolf ( 2428 ) writes:
    
    so it reads like someone you know telling you how they use such and such a product.
    "I went up the cabin last week with my girlfriend and tried out those new pills I heard about while I was there."
    
    Oh, that has never ever been done in advertising... =)
    How about stuff like
    And the angels, all pallid and wan,
    Uprising, unveiling, affirm
    That the play is the tragedy, "Impotence,"
    And its hero the Conqueror Pill.
    Or:
    Tis now the very witching time to have bad credit rating,
    When the stores yawn, a
- Just great... (Score:5, Funny)
  
  by El ( 94934 ) writes: on Tuesday January 13, 2004 @10:37PM (#7969416)
  
  ... now my Bayesian filter is throwing out all email from my Lewis Caroll quoting friends! Thanks a lot, spammers!
  
  Parent Share
  twitter facebook
I don't get it, really (Score:5, Insightful)

by theRhinoceros ( 201323 ) writes: on Tuesday January 13, 2004 @10:20PM (#7969223)

"Most of the illegal-exploit spammers use hash busters and any other trick they can to get past filters, refusing to accept that people use spam filters because they really don't want spam," Linford added.

I really understand this part: going after people who are taking active measures against your enterprise due to their disinterest. Why bother to market to them at all? Is the rate of return worth all the ill will, DOS attacks and legislation?

Share
twitter facebook
- Re:I don't get it, really (Score:5, Insightful)
  
  by radicalskeptic ( 644346 ) writes: <x@NOsPaM.gmail.com> on Tuesday January 13, 2004 @10:31PM (#7969359)
  
  One reason is that ISPs, corporate servers, or some other body might have implemented the filtering, and not the one reading the mail.
  
  Parent Share
  twitter facebook
  - Feature added (Score:3, Insightful)
    
    by Felinoid ( 16872 ) writes:
    
    In the past many ISPs would add filters and NOT tell the users they were doing it.
    Now a days however ISPs (most notably Earthlink and MSN) advertise spam blocking as a feature.
    If people wanted this stuff you'd think non-filtering ISPs would advertise "You get ALL your e-mail".
    
    But back to the original point. Spammers have used misleading topics in e-mail if only to make sure you don't delete the message. That and creating spam lists based on people who DO NOT like spam or of people who have manually opted o
- Re:I don't get it, really (Score:5, Interesting)
  
  by McDutchie ( 151611 ) writes: on Tuesday January 13, 2004 @10:40PM (#7969446) Homepage
  
  Why bother to market to them at all?
  
  In addition to living in their own criminally delusional world, spammers often don't spam for themselves but work for others. They get paid by their, er, client for each message sent, it doesn't matter to them whether it's wanted or not.
  
  Plus, there's always that .001% of suckers to keep the biz going if the cost of sending is close to zero.
  
  Parent Share
  twitter facebook
- Re:I don't get it, really (Score:5, Insightful)
  
  by Anonymous Coward writes: on Tuesday January 13, 2004 @10:41PM (#7969449)
  
  The technique also makes obvious the lie of their "we're just innocent entrepeneurs trying to make a buck" defense. Innocent entrepeneurs don't go out of their way to try to hack their data into other people's computers, past programs that are every bit as clear a sign of intent as a "No Soliciting" sign on your door.
  
  On every spam thread on Slashdot, there's someone complaining that technical measures won't solve the problem, and another saying legal measures won't solve the problem. The answer is that you need both: technical measures to assure the identity of the sender -- both spammer and sponsor -- as well as legal measures to provide for punishment.
  
  Parent Share
  twitter facebook
- Re:I don't get it, really (Score:5, Insightful)
  
  by Eosha ( 242724 ) writes: <esomas&hotmail,com> on Tuesday January 13, 2004 @10:44PM (#7969470) Homepage
  
  Unfortunately, spammers are not in the business of selling things to consumers. They are in the business of selling advertising space to other companies. As long as they can convince unscrupulous business owners that advertising via spam is worthwhile, the spam will continue.
  
  Parent Share
  twitter facebook
- Re:I don't get it, really (Score:3, Insightful)
  
  by commodoresloat ( 172735 ) writes:
  
  It just goes to show, they're not just motivated by greed. They, or at least the people making the programs that do this, actually *want* to annoy the shit out of people. They think it's their right to annoy us like this and they're on a mission to assert that right by subverting all attempts to tune them out. It's not just greed; it's a weird kind of sociopathy.
- Re:I don't get it, really (Score:3, Insightful)
  
  by rgmoore ( 133276 ) * writes:
  
  It's possible, if not likely, that some of the spamware authors are doing it for the challenge. Some of those guys are allegedly pretty good programmers, and I suspect that many of them are essentially hackers with no sense of morals. I could easily imagine somebody like that trying to figure out how to bypass spam filters just because it was a challenge, not because he actually expected any particular rewards for it. It's like trying to break into the computers in the Pentagon; it's stupid and illegal b
It's not gibberish, it's steganography (Score:4, Interesting)

by phr1 ( 211689 ) writes: on Tuesday January 13, 2004 @10:20PM (#7969226)

They are sending sekrit instructions to al-spamda about where to hide the weaponz of mass distraction. Or who knows. Any government efforts to control steganography (like reported just yesterday [slashdot.org]) better go after spammers first, or we have to wonder what they're really up to.

Share
twitter facebook
- Parent post is not offtopic (steganography) (Score:5, Insightful)
  
  by phr1 ( 211689 ) writes: on Tuesday January 13, 2004 @10:29PM (#7969337)
  
  Whoever modded it that way is a moron.
  Spam is a perfect carrier for steganographic data since it's broadcast to millions of people and nobody can fall under suspicion merely by receiving it. When the government wants to monitor people's communications to search for steganography, when they don't do anything about spam, the purpose of the monitoring is probably not the stated one.
  
  Parent Share
  twitter facebook
Why? (Score:3, Insightful)

by aePrime ( 469226 ) writes: on Tuesday January 13, 2004 @10:20PM (#7969233)

I can see them doing this to overcome Bayesian filters, but why? AFAIK, Bayesian filters are not used much (if at all) on mail servers. These filters are run at home by geeks.

Granted, this may get them past the filters, but if somebody's gone through the effort of setting up a Bayesian filter, they're not going to buy your product even if you get into their inbox. It seems like a waste of everybody's effort, and I mean including the spammers.

Share
twitter facebook
- Re:Why? (Score:3, Insightful)
  
  by aXis100 ( 690904 ) writes:
  
  I agree about the bayesian comment. There are plenty of other very valid things to look for when filering spam on servers:
  
  * valid sender domain
  * html links to external images etc, or large amounts of html in general.
  * blacklisted servers/relays
- Re:Why? (Score:3, Informative)
  
  by Gherald ( 682277 ) writes:
  
  Yes, ISPs do not use Bayesian filters. Those are rare and spammers do not care about them.
  
  Random strings of text are used to get through the internal checks that large ISPs run on their message traffic.
  
  Yahoo, Hotmail, etc have "bulk email" type folders. In addition to using spamassasin type techniques, the filter scripts that put messages in these folders will check to see if the same message is being sent to multiple addresses. If this is so, it raises a flag and someone checks to see if its a genuine
Simple Solution... (Score:3, Interesting)

by tunabomber ( 259585 ) writes: on Tuesday January 13, 2004 @10:21PM (#7969240) Homepage

We just need a lameness filter for spam that looks for non-sequiturs and other crap like O.,b|f-u.s,c;a,t.e,d W,.o.r.d.s.

Share
twitter facebook
- Re:Simple Solution... (Score:4, Insightful)
  
  by drooling-dog ( 189103 ) writes: on Tuesday January 13, 2004 @10:54PM (#7969557)
  
  I've been filtering subject lines with too much punctuation for some time now; it catches quite a bit.
  
  Parent Share
  twitter facebook
What I'd be interested in... (Score:4, Interesting)

by dswensen ( 252552 ) * writes: on Tuesday January 13, 2004 @10:21PM (#7969246) Homepage

...is knowing how successful this spam becomes. I get a lot of it, and I have to think that you'd have to be beyond merely dim or technically inept to take it seriously -- you'd have to be insane or have some sort of debilitating head injury. (Granted, that still may leave a lot of the Internet covered, but still).

Spammers seem to have a lot of success when they're emulating more legitimate sources like Ebay, Microsoft, etc., but I get spam now that can't even seem to decide what it's selling. The subject line says "get rid of mortgage payments" and the body is selling "V.I.A.G.01331.A." I'm not even sure what I'd be getting if I were dull enough to actually click on anything in the message. Heck, I'm not sure if even the SPAMMERS know.

I'd be interested to know if these spams are as successful as past efforts have been.

Share
twitter facebook
Not an effective technique (Score:4, Interesting)

by Len ( 89493 ) writes: on Tuesday January 13, 2004 @10:21PM (#7969248)

This doesn't seem to be a very effective spam technique. It works pretty well at fooling my "bayesian" spam filter, but the spam messages have gibberish subject lines! Who's going to read a message titled "deprecatory parrot bizarre dessert"? (an actual example)

Share
twitter facebook
- Re:Not an effective technique (Score:3, Funny)
  
  by Viqsi ( 534904 ) writes:
  
  Well, you've got to admit that they have a point. That *would* make a very bizarre dessert.
We already have tools to stop this (Score:3, Insightful)

by Raindance ( 680694 ) * writes: <johnsonmx@gma[ ]com ['il.' in gap]> on Tuesday January 13, 2004 @10:22PM (#7969260) Homepage Journal

A Bayesian spam filter teamed with a standard grammar checker adapted from an open-source word processor.

It'll take more processing power, and lead to spammers following proper grammar in their pseudo-nonsense, but it's the way to raise the bar against this attack (making those spammers that can't clear the bar out of luck).

Reminds me of a Dr. Seus book...

RD

Share
twitter facebook
My Bayesian filter is slowing becoming a whitelist (Score:4, Interesting)

by ObviousGuy ( 578567 ) writes: <ObviousGuy@hotmail.com> on Tuesday January 13, 2004 @10:23PM (#7969269) Homepage Journal

There is so much crap flooding my inbox these days that the spam filter is slowly becoming a whitelist of my coworkers and a few external customers. Hardly anything else that comes in is worth the time to look at.

I know that whitelists aren't the answer, but then nothing short of immediate execution of spammers is.

Share
twitter facebook
The Grammar Filter (Score:3, Interesting)

by Esteanil ( 710082 ) writes: on Tuesday January 13, 2004 @10:25PM (#7969287) Homepage Journal

Let's see... There is translation software out there that has some basic understanding of grammar.
Should we add a grammar-filter to the list of things we look for it spam?
A large amount of incorrect grammar would increase the chances of the file being caught in the spam filter.
Of course, this would lock out most of AOL users from writing email... But is that really so bad? :P

Share
twitter facebook
- - Re:The Grammar Filter (Score:3, Funny)
    
    by PacoTaco ( 577292 ) writes:
    
    Your absolutely write.
Bayes filters deal with it fine (Score:5, Informative)

by sidney ( 95068 ) writes: on Tuesday January 13, 2004 @10:26PM (#7969296) Homepage

Paul Graham mentions the technique in this article [paulgraham.com], pointing out that the Bayesian filters look for words that commonly appear just in spam or just in non-spam. The random words are common in neither, so are simply ignored by the filters. As a technique, the random words would get past a filter that looks for some spammy to non-spammy word ratio. But that's not how the spam filters work.

Share
twitter facebook
- - Re:Bayes filters hubert balloons c6as6g89y9aigah98 (Score:3, Informative)
    
    by mabhatter654 ( 561290 ) writes:
    
    to clarify it, say you report a spam to Yahoo, they most likely are getting 10,000 of the same subject from similar IPs so they just drop the connection after the subject is entered [that is an elemtary feature of even the oldest email servers]...it never gets sent thru the system or to your spam filter. But now they have to run the spam filter on every single email...costing more time than simply dropping it because of subject...remember they deal with 10,000 of the same spam at once in a day....except no
The problem with this technique (Score:5, Interesting)

by pclminion ( 145572 ) writes: on Tuesday January 13, 2004 @10:27PM (#7969314)

The problem with this technique for foiling spam filters is that Bayesian filters only examine words which occur in the dictionary of commonly used words. A Bayesian filter is individually trained on your personal mail. If the "red herring" words in the spam don't occur in your personal dictionary, they will be ignored by the filter and have no impact on its decision.
For example, take the word "Byzantine." This is a very non-spammish word. However, if you've never received a legitimate email containing the word "Byzantine," your Bayesian filter will not have it in its dictionary, and the word will be ineffective in "tricking" the filter. The red herring words only have an impact if they are relevent to your actual mail sample. Since everybody's email communication is different (some of us are programmers, some of us are literature majors, etc.), this is a real sledgehammer approach to defeating the filters -- and it's extremely ineffective.
This technique just proves that spammers don't understand the theoretical underpinnings of current Bayesian anti-spam methods. Otherwise, they'd be using much more common words as red herrings, instead of these extremely rare, and therefore insignificant, words.
I personally use a spam filter of my own design which is based on information-theoretic and neural network techniques. It kicks the shit out of spam, even the messages that include these stupid red herring words. The spammers once again prove that they are morons, incapable of understanding how anti-spam technology actually works.

Share
twitter facebook
- Re:The problem with this technique (Score:5, Interesting)
  
  by YU Nicks NE Way ( 129084 ) writes: on Tuesday January 13, 2004 @10:38PM (#7969429)
  
  Actually, the attack is more subtle than you think. The value of a random-words attack lies in the long-term damage it does to adaptive filters, not in how well or poorly it does with fixed filters.
  
  When an adaptive filter sees a rare word in a spam, it is likely to assign that word high spamminess. Problem is, the next time you see that word is likely to be in a piece of ham, resulting in a false categorization of a piece of ham as spam. The user cost of such an assignment is very high, and so users will be forced to look at their junk mail...which is, after all, what the spammers want.
  
  Parent Share
  twitter facebook
  - Re:The problem with this technique (Score:4, Informative)
    
    by sketerpot ( 454020 ) writes: <sketerpot@nOspaM.gmail.com> on Tuesday January 13, 2004 @10:58PM (#7969589)
    
    In most adaptive filters, only words that have been used a certain number of times are taken into consideration. For example, the original Plan for Spam algorithm ignores any word that doesn't appear over 5 times in the corpus.
    
    Parent Share
    twitter facebook
  - Re:The problem with this technique (Score:3, Informative)
    
    by anthony_baxter ( 48233 ) writes:
    
    I've actually observed this problem - the issue is "overtraining", that is training on everything. I recently threw away my training database and now only train on messages that don't score 0.0 or 1.0 ("non-edge" training). This produces a much smaller database, and is far more deadly against the random spam words attempts.
- - Re:The problem with this technique (Score:3, Informative)
    
    by pclminion ( 145572 ) writes:
    
    Well what are you standing around talking for? Hook us up!
    I'd love to -- in fact, I've even got my own website registered for it -- neuralnw.com [neuralnw.com] -- but development has stalled recently, and you'll find no trace of the program on the website. The filter, or at least a rudimentary version of it, is available if you know where to look for it. We published a paper at USENIX back in June covering this program. Since then, I haven't done much development, because frankly, there are better ways to spend my time
Grammar Check and Spell Check... (Score:5, Insightful)

by LostCluster ( 625375 ) * writes: on Tuesday January 13, 2004 @10:29PM (#7969333)

The solution to randomness is to spell check and grammar check incoming e-mail, and consider violations as cause to ad points to the score indicating that it's spam-like.

Sure, a few strange words might be a name that's not in the filter yet, but pure gibberish should be a red flag that either somebody's cat walked on the keyboard, or there's spam going on here. Heavy use of "non-spam" words can override to indicate it's good mail... but a poorly composed mail that doesn't use language seen in friendly mail is highly likely to be spam....

Share
twitter facebook
- Re:Grammar Check and Spell Check... (Score:5, Funny)
  
  by El ( 94934 ) writes: on Tuesday January 13, 2004 @10:39PM (#7969436)
  
  Wouldn't those same checks determine that 95% of /. postings are spam?
  
  Parent Share
  twitter facebook
  - Re:Grammar Check and Spell Check... (Score:4, Funny)
    
    by sunspot42 ( 455706 ) writes: on Wednesday January 14, 2004 @02:10AM (#7970887)
    
    Yes. And your point?
    
    Parent Share
    twitter facebook
- Re:Grammar Check and Spell Check... (Score:4, Interesting)
  
  by mrpuffypants ( 444598 ) * writes: <mrpuffypants@gmail . c om> on Tuesday January 13, 2004 @10:54PM (#7969559)
  
  The solution to randomness is to spell check and grammar check incoming e-mail
  
  Apparently you've never gotten emails from either a:
  
  1) 14-year old girl
  2) Gamer
  3) UNIX sysadmin describing a sendmail .cf file
  
  Yikes.
  
  Parent Share
  twitter facebook
As if spam wasn't a big enough waste of bandwidth (Score:3, Insightful)

by Kris_J ( 10111 ) * writes: on Tuesday January 13, 2004 @10:30PM (#7969343) Homepage Journal

Try this: turn on the "size" column in you favourite email client. I use Eudora (Tools-options-Mailbox). Note that a normal plaintext email is 3k. Now look at the size of a spam. You're paying for that, or someone is. Soon the spam arms race is going to require everyone to have broadband just to check their email.
--
Still looking for an email replacement...

Share
twitter facebook
If someone made a gibberish filter? (Score:3, Funny)

by g00bd0g ( 255836 ) writes: on Tuesday January 13, 2004 @10:30PM (#7969353) Homepage

could it be used on politicians?

Share
twitter facebook
- Re:If someone made a gibberish filter? (Score:3, Funny)
  
  by Texas Rose on Lava L ( 712928 ) writes:
  
  It already exists. It's called the Mute button.
Different Techniques (Score:5, Interesting)

by kalidasa ( 577403 ) * writes: on Tuesday January 13, 2004 @10:33PM (#7969381) Journal

The article doesn't do a good enough job of explaining the different techniques in use.
First, hash busters. Yes, spammers are loading a random jumble of meaningful words in meaningless sequences into their spam, usually in the plaintext message body of a message with HTML content (i.e., you get hash buster - html message with spam content - hash buster). So HTML-aware clients (the main clients targeted I'm sure are AOL and Outlook Express) show the spam message, but not the hash buster. I'm guessing that this is specifically targeting bayesian filtering tools at AOL (anyone know if AOL is using a bayesian filter?); it works by introducing words that would not be found in a spam corpus in greater numbers than those that would.
Second, noisy spelling, like v1@gr@. Obviously this is also intended to defeat regex-based filters like spamassassin. If you vary your cliches enough, and you introduce very strange, but easy-for-a-human-reader-to-recognize spelling variants, you make it much more difficult for filter writers to write effective regexes.

Share
twitter facebook
The real problem will be deliberate poisoning (Score:5, Interesting)

by Jerf ( 17166 ) writes: on Tuesday January 13, 2004 @10:33PM (#7969384) Journal

The real problem will be when the spammers finally figure out how to deliberately poison the Bayesian filters. So far they're using more-or-less random words, but that won't really work against Bayesian; it can tolerate that.

However, what constitutes "non-spam" is not as unique as most people think, as I've examined here [jerf.org]. If they figure out how to deliberately put in hammy words, Bayesian will fall.

I feel OK posting this because I freely admit to this point I've overestimated them; I'm sure spammers have read that piece, and to date they have been too stupid to figure out what I said in plain English. But sooner or later one of them is going to figure out.

There's a strong core of "ham" that is "ham" for everybody, and sooner or later they're going to start abusing that.

And if I may forstall one objection... "But you don't understand Bayesian, it's [awesome for some reason and can't be beat ever, by anybody]" - I'll listen when you've actually written a program to examine filters yourself, OK? I understand it pretty damn well. It'll take more then bald assertions to convince me I'm wrong, I've done actual research, in the original sense of the word.

Share
twitter facebook
- Re:The real problem will be deliberate poisoning (Score:3, Insightful)
  
  by Uggy ( 99326 ) writes:
  
  It's really simple. The ONLY way spammers can defeat Bayesian filters is if they imitate what you call ham. ham = What you want; spam = what you don't want. Unless they custom tailor each message or random words to each user and guess (through some form of magical powers) what kind of email you call ham, then they fail.
  
  Besides, if they could guess what your ham looked like, then they wouldn't be spammers... they'd be advertising folks pulling in 7 figures.
- Re:The real problem will be deliberate poisoning (Score:3, Insightful)
  
  by sidney ( 95068 ) writes:
  
  Nigerian scam spam is very different from most spam. It is a story that can be carefully written to use only words that are commonly used, assuming that the people who author them are able to go beyond their broken English all the way to use of statistically hammy correctly spelled text.
  
  But how would you sell more inches on your male member enhanced with V*@gra to make money fast watching celeb teenie nymphos doing it on the farm while only using ordinary non-spammy words?
  
  There are only so many ways to ge
- - Re:The real problem will be deliberate poisoning (Score:3, Interesting)
    
    by Jerf ( 17166 ) writes:
    
    Do you have evidence to back that assertion? In my case (I know it's just me), ham basically means either refering to my open-source projects or written in French (even then spambayes does a good job at rejecting French spam).
    
    Language is often a big indicator; since spam is aimed at a particular langauge group I don't consider it much. The fact my filter marks Japanese or Korean messages as spam is almost irrelevant, in a way, since I can't read it anyhow and it's easily dismissed.
    
    But there's this common
/usr/share/dict/words (Score:4, Interesting)

by HeelToe ( 615905 ) writes: on Tuesday January 13, 2004 @10:33PM (#7969386) Homepage

I thought about this after seeing my inbox spam increase to about 80 a day (the box that contains what is filtered is usually 10 per hour - my adress has been valid for just short of 10 years).

Why not check the subject or first few lines of plain (not html) text and see if 80% of it is in /usr/share/dict/words? I thought about trying this out, but have been too busy to get off my ass and do it.

Share
twitter facebook
Slimier than slime . . . (Score:5, Interesting)

by mjprobst ( 95305 ) writes: on Tuesday January 13, 2004 @10:34PM (#7969391) Homepage Journal

I saw one just yesterday that contained a list of important key sentences and phrases from the literature of common charities and political activism organizations.

In other words, if your Bayesian filter accepts those, based on your past decisions, it will detect the spam. If you reject the spam, you reject these communications as well.

Good filtering practice would dictate that one reads the junk box carefully enough to find both false positives and negatives. But the sheer bulk of mail that ends up in the junk box makes this unfeasible for many.

I have started letting these particular kinds of spam through, manually categorizing them (many words of random strings, dictionary vocabulary attack, positive phrase attack) in the hopes that filtering technology will soon advance to the point where these can be used as inputs to a more intelligent system.

Of course overhauling the mail system is a prerequisite to solving any of this long-term. For once I don't mind D. J. Bernstein's Internet Mail 2000 proposals. Of course there are other proposed systems, none of which has enough momentum to start a slow steady change. The end result of any non-consensus system will be to fragment the worldwide network of Email into competing, noncompatible systems that need to communicate through some kind of loophole or gateway. Back to FIDO-net days.

Share
twitter facebook
I see this too (Score:5, Interesting)

by rockwood ( 141675 ) writes: on Tuesday January 13, 2004 @10:37PM (#7969418) Homepage Journal
I've been using "SpamBayes Outlook Plugin" since a previous /. article talked about it.
Agreeing with this article, over the past week or two I have seen excessive about of spam being missed by SpamBayes, even after marking them as spam for improved filter, they continue to hit the inbox whereas previous absolutely no spam made my outbox. Additionally, there may have only been 2 or 3 emails marked as possible spam when they were not. And zero items mark as definite spam that were not.
SpamBayes has worked great previously, but now even it is falling short.
I feel as the spammers manipulate the conents/context of the spam, it will eventually become impossible to determine the difference without physically looking at 500+ email daily.
My primary use of email is business and not personal, therefore I cannot risk missing a client email, payment, question, etc... I've also see a progression of clients having MY emails deleted or caught in spam filters due to the business aspect and requests for payments. I feel this is primarily due to the comparison of too-often-common-phrases that a spam email and a business email contain. Such things as Click here to submit payment, or Buy these Products, Overdue etc... Even though all clients I email are only clients that contact me. I never cold-email anyone.
More spammer are using this random text as the only text in the subject and body, and using an image as the content of their email, which makes scanning even more complicated, if not impossible.
Being on the net prior to what is is today (going on 20 years), I often wonder how much control the spam actually has over the net in several aspects
- If spam were to disappear, will overhead costs decrease that greatly in order for ISP's to pass along higher saving to the consumer?
- If Spam were to disappear completely, how much faster would the Internet be?
Has anyone ever done a study to determine how much effect spam has on degrading the net, and what would it be like if all spam was gone tomorrow?
Share
twitter facebook
The next attempt (Score:3, Insightful)

by eschasi ( 252157 ) writes: on Tuesday January 13, 2004 @10:38PM (#7969432)

As the article points out, the technique isn't as effetive as one might initially think. However, there's a clear "next generation" method that I'm sure we'll soon be seeing:
Insert four or five lines of valid extra text -- lines from books, selections from recent USENET postings, etc, etc -- into the spam. Make the selection semi-random. Now do it 100 times and send 100 copies to each person on the mailing list.
One of them will get through. And the spammers will continue to work.

Share
twitter facebook
A method for removing spam from your life. (Score:5, Interesting)

by crazyphilman ( 609923 ) writes: on Tuesday January 13, 2004 @10:51PM (#7969535) Journal

It's old fashioned, and some of you will probably make fun of me for using it, but hey, I'm old school. FYI, here's my method:

1. Create manual spam filters (NOT beyesian filters) in your inbox called "Friends and Family", "Work", "Services", "logfiles", and any others you find you need. Each category applies to a broad type of email address you'll receive email from. Then create a subdirectory in your inbox for each of these filters (named the same way, naturally).

2. For each filter, build a list of people who are allowed to email you. For example, your ISP, your bank, and your phone company would probably be added to services. Just add the email address they send their messages from to the list.

3. For each filter, have the filter move messages matching the filter (From equals ) to the correct subdirectory for the filter. Then stop processing for that message, so it doesn't get interpereted by other filters. Think of this as an analogy for ipfilter or ipfw in your firewall setup -- only you're filtering emails instead of packets.

4. Finally, DELETE EVERYTHING ELSE in the very last filter.

You USE this approach by doing a quick scan of the deleted items folder to see if anything is interesting. If not, just clean out those deleted items. It's a one step operation, much easier than selectively deleting a hundred emails one at a time.

Then, you scan each of the folders you set up, IF the folder has picked up an email, focusing only on your REAL email.

This approach has saved me a HUGE amount of work lately. My life is a whole lot easier, and it's way easier than trying to train a Beyesian filter. If I don't know you, you can't get too much of my attention.

It's all about being on the list, sort of like getting into a nightclub... ;)

Share
twitter facebook
- Re:A method for removing spam from your life. (Score:4, Funny)
  
  by John Jorsett ( 171560 ) writes: on Wednesday January 14, 2004 @01:05AM (#7970573)
  
  Phil! Thank God! I've been trying to get in touch since I had to change ISPs and you stopped answering my email. How have you been?
  
  Dad
  
  Parent Share
  twitter facebook
- Re:A method for removing spam from your life. (Score:3, Insightful)
  
  by ediron2 ( 246908 ) * writes:
  
  Phil;
  Twice in this thread, I see you talking about training the bayesian filter. You seem to think this is something of a burden, like training a big dog...
  I think you misunderstand how easily one trains the current Mozilla email client's bayesian filter.
  Day 1:
  1: the mail comes in, spam included.
  2: one of the inbox columns is a blue 'recycle' lookin' symbol. It is a toggle that acts like the 'new' indicator column, and a click on it turns state on or off.
  3: glancing through the list, one clicks o
Simple trick that is semi-efficient (Score:5, Interesting)

by tomstdenis ( 446163 ) writes: <tomstdenis.gmail@com> on Tuesday January 13, 2004 @10:52PM (#7969540) Homepage

Just block the domain name/ip of the hosted images. Most spams I get come from random IPs but usually have common IP/domain name for the hosted images e.g.

hostz300001.com/ads/viagra.jpg

Or whatever. I've cut down from 50 spams to about 3 or so a day by doing that.

I bet a bayesian filter would work nicer but unfortunately I'm too lazy to mod the mail setup [that isn't mine] to get one installed..

Tom

Share
twitter facebook
- I use that method (Score:3, Informative)
  
  by KalvinB ( 205500 ) writes:
  
  includes sourcecode [icarusindie.com]
  
  Mercury Mail's session logs indicate a closed connection to indicate where e-mails begin and end but if you're using something else there's a RinetD mod with source which logs e-mails in such a way so that ripping through them is easy.
  
  My filter is all of 23KB and I get virtually no spam. I update every once in awhile when a spam gets through.
  
  I also have a couple sub-domains that point to a spamcan on my home connection which I use to bait spammers so I can preemptively filter them ou
Word Salad (Score:3, Interesting)

by JohnGrahamCumming ( 684871 ) * writes: <slashdot@@@jgc...org> on Tuesday January 13, 2004 @11:03PM (#7969623) Homepage Journal

Weird. I am talking about this at the MIT Spam Conference [spamconference.org] on Friday and on a technique that can break a Bayesian spam filter.

John.

Share
twitter facebook
How I deal with spam (Score:3, Interesting)

by mabu ( 178417 ) writes: on Tuesday January 13, 2004 @11:08PM (#7969658)

I have had my main e-mail published and unchanged since 1995. It's probably on 99% of all spam mailing lists. One of my servers handles about 600 POP3 accounts. My stats currently indicate that now more than 80% of our SMTP traffic is confirmed spam.

I don't believe in content-based filtering. We have a strict policy of not examining in any way, shape, or form, the content of any e-mail on our network.

We deal with spam by implementing an array of fully-tested, fairly conservative relay blacklists which block the inbound SMTP connection before the junk mail is even transmitted.

In more than two years of operation, we've only confirmed about six legitimate e-mails that were blocked, and we handle tremendous mail volume. It's an easy matter to "whitelist" anyone who might end up getting RBL'd to make sure the client can communicate with who they want. In EVERY case where a legitimate source was blacklisted, it was shown their ISP was irresponsible and the listing was valid.

In addition to using RBLs, we also have an array of hard-coded IP blocks that our server will not accept mail from. This covers a good bit of the rogue Asia-pacific ISPs that are the largest source of open relays. Something as simple as blocking major portions of 61.* have shown to reduce spam by 30+%. Anyone legitimately in China that needs to communicate with our network can be quickly whitelisted. Ironically, most of the ISP SMTP relays are not near the same broadband IP ranges - they obviously know how effective this technique is.

With RBLs and hard-coded spamming in effect, instead of 200 spams a day, I might get 3-5. As soon as I get new spam, I report it to Spamcop, and I notice a quick reduction in future spam of that nature immediately.

We're now getting near the point of blacklisting the entire 24.* IP block as well - which encompasses, among other things, a large portion of Comcast IP blocks that Comcast can't or won't control.

I'd like to see more ISPs simply refuse to accept mail from rogue networks. Then these networks would have to be more responsible.

Let me preface all this by saying our policy is to whitelist anyone who complains they have legitimate mail being blocked. For some strange reason, we don't hear any spammers making these requests. That's a shame because I'd be happy to visit them personally to make sure their situation is resolved in a mutually-deserving manner.

Share
twitter facebook
- - Re:How I deal with spam (Score:3, Interesting)
    
    by mabu ( 178417 ) writes:
    
    That's the real problem with blocking by IP ranges. I'm in 24.* because it's the only high-speed Internet I can get. It's not Comcast but I see tons of probes from infected machines local to me in my area of 24.*. But I'm not the only legitimate business living in a broadband network that contains tons of clueless residential subscribers. What would you have us do, get T1 lines and $3,500/mo ISP feeds? Go back to dialup? What's wrong with this picture?
    
    We're not blocking all of 24.* right now because ther
What about Bayes on word n-tuplets? (Score:3, Interesting)

by adrianbaugh ( 696007 ) writes: on Tuesday January 13, 2004 @11:39PM (#7969917) Homepage Journal

It seems to me it would be much harder to poison a filter that did Bayes by splitting email into word pairs or triplets and assigning ham and spam probabilities for each. That way the bad grammar and random word lists would be extra-bad. I suspect longer sequences would become harder and harder to foil. They might require extra training of the database, but if you're getting lots of spam that isn't really a problem. Perhaps the word sequence length could be configurable.

Share
twitter facebook
Habeas SWE in spam (Score:3, Interesting)

by YetAnotherDave ( 159442 ) writes: on Wednesday January 14, 2004 @12:42AM (#7970418)

Has anyone else seen a spurt of Habeas SWE headers in spam?

I'd never seen any until this week, and suddenly I've got like 5/day.

I forwarded them to the good folks at habeas, hopefully the spammer will get sued into oblivion, but it's forced me to re-score SWE with a much lower bonus in spamassassin...

http://habeas.com/servicesHowSWEWorks.html for those who don't know what I'm talking about, btw

Share
twitter facebook
Gibberish, or code? (Score:5, Interesting)

by cr0sh ( 43134 ) writes: on Wednesday January 14, 2004 @02:16AM (#7970904) Homepage

I, too, have noticed these seemingly random words that seemed to have nothing to do with the main text of the spam. I have also noticed the "gibberish words". One of my thoughts was that it was for defeating or bypassing bayesian filters - and likely, that is the case. But my thoughts turned to another possible use...
What if spam and the spammers software - was actually being used by a third party in a surepticious manner to send/receive messages? Kinda like plaintext stego. Maybe the software used by spammers is backdoored by this third party - he sends instructions to the machine(s), maybe via a virus or something simpler, the spammers send their messages, but "unknown" to them the spams have this garbage at the end. The spammer doesn't really care, maybe he bitches at whatever passes as tech support for the spam software. Most people who recieve the spam see the stuff as garbage, or filter busters. But a certain group of the third party's friends - they have special email software that downloads these spams, and strips the garbage out, decodes it, and reassembles it into the real message. Maybe each spam only contains the equivalent of a couple of characters after decoding (maybe the garbage is actually packets telling order in the sequence, and other info to reconstruct the message) - but over a week or so, an entire message could be sent...
What is the possibility of that? Occam's Razor suggests otherwise, and filter busters are probably what the stuff is - but...what if...?

Share
twitter facebook
- Re:Gibberish, or code? (Score:4, Funny)
  
  by Steve B ( 42864 ) writes: on Wednesday January 14, 2004 @09:10AM (#7972270)
  
  What if spam and the spammers software - was actually being used by a third party in a surepticious manner to send/receive messages? Kinda like plaintext stego. Maybe the software used by spammers is backdoored by this third party - he sends instructions to the machine(s), maybe via a virus or something simpler, the spammers send their messages, but "unknown" to them the spams have this garbage at the end. The spammer doesn't really care, maybe he bitches at whatever passes as tech support for the spam software. Most people who recieve the spam see the stuff as garbage, or filter busters. But a certain group of the third party's friends - they have special email software that downloads these spams, and strips the garbage out, decodes it, and reassembles it into the real message. Maybe each spam only contains the equivalent of a couple of characters after decoding (maybe the garbage is actually packets telling order in the sequence, and other info to reconstruct the message) - but over a week or so, an entire message could be sent...
  This would be a very useful method for terrorists -- it would not only conceal the message itself, but also would defeat traffic analysis (i.e. nobody would be able to tell who sent or received the message -- it's sent by a spam king and received by everybody).
  About the only way to guard against it -- or find out if the terrorists are already using this channel -- is to anal-probe all spammers for their client lists, then anal-probe all the clients. Fortunately, the obvious criminal content of 99.9% of spam provides sufficient probable cause for such action.
  
  Parent Share
  twitter facebook
The real reason behind the weird typing in spam: (Score:3, Funny)

by phaze3000 ( 204500 ) writes: on Wednesday January 14, 2004 @04:07AM (#7971287) Homepage

Narcoleptic spam creators [theregister.co.uk]

Share
twitter facebook
- Re:Should be easy to block (Score:5, Insightful)
  
  by kalidasa ( 577403 ) * writes: on Tuesday January 13, 2004 @10:50PM (#7969526) Journal
  
  Most of them are using random word sequences; the random strings like xdwexe are not usually an important percentage of the overall text, no more than names might be. Besides, how large a corpus of "valid" words do you want to use? The OED weighs in at almost 0.5M; and then with another 0.5M uncatalogued scientific terms and neologisms, plus common mis-spellings and typos and jargon and dialect orthography (like our color, meter, checker, jail etc. for the Brits colour, metre, chequer, gaol) ...
  If you don't want to keep the entire corpus of "valid" words in your code, you're going to have to make some compromises. Maybe you'll want to exclude words like "thou," "hauberk," and "coney." Not so good if you're subscribing to an Early Modern Literature listserv.
  So you're going to need some logic to determine whether or not a "valid" word that occurs in a message is meaningful. Here's how one rather well known discussion [paulgraham.com] of Bayesian filtering deals with this issue (of unknown words); this is precisely the logic that spammers with random meaningful words are exploiting:
  One question that arises in practice is what probability to assign to a word you've never seen, i.e. one that doesn't occur in the hash table of word probabilities. I've found, again by trial and error, that .4 is a good number to use. If you've never seen a word before, it is probably fairly innocent; spam words tend to be all too familiar.
  
  So, what if all the words are valid, but the sentences aren't? Grammar checkers involve a lot more logic than spellcheckers do, and are consequently a lot less accurate. Fact is, you can also fool a grammar checker filter: just pad with random quotations from novels, etc. instead of padding with random words or random misspelled strings.
  So the Bayesian approach of identifying spam and ham words is a pretty effective one, given the limitations.
  
  Parent Share
  twitter facebook
- Re:I keep praying for that silver bullet (Score:3, Insightful)
  
  by Steve B ( 42864 ) writes:
  
  I keep praying for that silver bullet that will end spam forever.
  What it will take is the enforcement of existing computer-cracking laws. Spammers will then have a choice between 5-10 year sentences or sending spam with no munged words, forged headers, misleading subject lines, etc.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

gibberish... (Score:4, Funny)

Re:gibberish... (Score:5, Funny)

Re:gibberish... (Score:3, Interesting)

Re:gibberish... (Score:5, Insightful)

Re:gibberish... (Score:3, Interesting)

Re:gibberish... (Score:4, Interesting)

[ADV] (Score:5, Funny)

You blew it. (Score:5, Funny)

Re:You blew it. (Score:3, Funny)

Re:You blew it. (Score:3, Funny)

What I don't understand (Score:4, Interesting)

Re:What I don't understand (Score:5, Informative)

Re:What I don't understand (Score:5, Interesting)

Re:What I don't understand (Score:5, Informative)

Re:What I don't understand (Score:3, Interesting)

Re:What I don't understand (Score:3, Informative)

Re:What I don't understand (Score:4, Interesting)

Re:[ADV] (Score:5, Funny)

Re:why not filter out 1337 sp3@k? (Score:5, Informative)

Re:why not filter out 1337 sp3@k? (Score:5, Interesting)

Re:why not filter out 1337 sp3@k? (Score:3, Interesting)

Re:why not filter out 1337 sp3@k? (Score:5, Funny)

Well... (Score:4, Interesting)

You'll laugh from it... (Score:5, Funny)

Re:I receive this today (Score:3, Funny)

Spamkiller doesn't care (Score:5, Interesting)

Re:Spamkiller doesn't care (Score:5, Insightful)

Re:Spamkiller doesn't care (Score:4, Interesting)

Re:Spamkiller doesn't care (Score:4, Insightful)

Re:Spamkiller doesn't care (Score:3, Informative)

Re:Spamkiller doesn't care (Score:3, Insightful)

Re:Spamkiller doesn't care (Score:5, Interesting)

Re:Spamkiller doesn't care (Score:4, Interesting)

Sometimes it isn't random words (Score:3, Funny)

Re:Sometimes it isn't random words (Score:3, Informative)

Re:Sometimes it isn't random words (Score:3, Funny)

Bigger beavers are the very reason for enlargement (Score:5, Funny)

New use for Project Gutenberg (Score:4, Interesting)

Re:New use for Project Gutenberg (Score:3, Funny)

Just great... (Score:5, Funny)

I don't get it, really (Score:5, Insightful)

Re:I don't get it, really (Score:5, Insightful)

Feature added (Score:3, Insightful)

Re:I don't get it, really (Score:5, Interesting)

Re:I don't get it, really (Score:5, Insightful)

Re:I don't get it, really (Score:5, Insightful)

Re:I don't get it, really (Score:3, Insightful)

Re:I don't get it, really (Score:3, Insightful)

It's not gibberish, it's steganography (Score:4, Interesting)

Parent post is not offtopic (steganography) (Score:5, Insightful)

Why? (Score:3, Insightful)

Re:Why? (Score:3, Insightful)

Re:Why? (Score:3, Informative)

Simple Solution... (Score:3, Interesting)

Re:Simple Solution... (Score:4, Insightful)

What I'd be interested in... (Score:4, Interesting)

Not an effective technique (Score:4, Interesting)

Re:Not an effective technique (Score:3, Funny)

We already have tools to stop this (Score:3, Insightful)

My Bayesian filter is slowing becoming a whitelist (Score:4, Interesting)

The Grammar Filter (Score:3, Interesting)

Re:The Grammar Filter (Score:3, Funny)

Bayes filters deal with it fine (Score:5, Informative)

Re:Bayes filters hubert balloons c6as6g89y9aigah98 (Score:3, Informative)

The problem with this technique (Score:5, Interesting)

Re:The problem with this technique (Score:5, Interesting)

Re:The problem with this technique (Score:4, Informative)

Re:The problem with this technique (Score:3, Informative)

Re:The problem with this technique (Score:3, Informative)

Grammar Check and Spell Check... (Score:5, Insightful)

Re:Grammar Check and Spell Check... (Score:5, Funny)

Re:Grammar Check and Spell Check... (Score:4, Funny)

Re:Grammar Check and Spell Check... (Score:4, Interesting)

As if spam wasn't a big enough waste of bandwidth (Score:3, Insightful)

If someone made a gibberish filter? (Score:3, Funny)

Re:If someone made a gibberish filter? (Score:3, Funny)

Different Techniques (Score:5, Interesting)

The real problem will be deliberate poisoning (Score:5, Interesting)

Re:The real problem will be deliberate poisoning (Score:3, Insightful)

Re:The real problem will be deliberate poisoning (Score:3, Insightful)