Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
Spam Your Rights Online

Honeypot For Identifying Email-Harvesters 252

Cheese Man writes "Mark Pilgrim describes a simple way to identify email-harvesters: "In each page I serve, I include a bogus email address, encoded with the date of access as well as the host IP address ... This has allowed me to trace spam back to specific hosts and/or robots." There's even a simple one-line example done with PHP. (Thanks to BoingBoing for the links.)"
This discussion has been archived. No new comments can be posted.

Honeypot For Identifying Email-Harvesters

Comments Filter:
  • I say... (Score:5, Interesting)

    by JoeLinux ( 20366 ) <joelinux.gmail@com> on Saturday June 21, 2003 @05:51PM (#6263587)
    That there should be email addresses that the big companies "float" out onto spamming lists. When a mass email comes back with these email addresses, it's a flag that its spam, and block the whole message from going into the system. Of course, security on what those email addresses are would have to be pretty tight...
    • Re:I say... (Score:3, Interesting)

      by Eyston ( 462981 )
      This is exactly what a lot of them do.

      I think Earthlinks Spam Blocker is using that idea.

      -Eyston
      • Re: I say... (Score:4, Insightful)

        by gidds ( 56397 ) <slashdot@gidds . m e .uk> on Saturday June 21, 2003 @06:44PM (#6263781) Homepage
        BrightMail [brightmail.com], too. My ISP [cix.co.uk] uses it - it traps about 70% of my spam. The great thing is that it has no false positives, so it just shunts every spam it identifies off to a separate mailbox which you need never bother with - you don't spend time or bandwith downloading it. (A few times a year I take a look at the stuff it's recently trapped just to check, but there's never been a single valid mail.)
        • Re: I say... (Score:3, Interesting)

          by JDevers ( 83155 )
          BrightMail definitely DOES have false positives. At my summer job (last summer, this year I am covered by assistantship :) as tech support at an ISP that used BrightMail I don't remember a week going by without someone complaining that our spam filter had caught some of their legit mail. Most of these were borderline spam but a sizable chunk were perfectly normal mail that had no "spamness."
          • Re: I say... (Score:3, Interesting)

            by gidds ( 56397 )
            I find that very strange, for two reasons:
            1. In my experience, it's caught spams probably into 5 figures by now, of which I've personally checked probably over a thousand, absolutely none of which were spam. And
            2. BrightMail's method can only find spam. Their honeypots have absolutely no legitimate use, so all the mail they get must be spam: untargetted, mass mailing, to an unchecked, harvested list of addresses. Assuming BrightMail then blocks only those mails, then I don't see how it can be b
            • BrightMail's method can only find spam. Their honeypots have absolutely no legitimate use, so all the mail they get must be spam: untargetted, mass mailing, to an unchecked, harvested list of addresses.

              Ok, so they have 100% certain spam mail examples. How do they then use them to block new mail? Do they block the From:? That can be forged, and is often a real innocent person. Do they block the IP? That may well be a normal mail server. Etc.

              This is just a naive thought, but I was wondering how they so

              • How do they then use them to block new mail?

                Ah. That I don't know. And of course that's the bit that's susceptible to errors.Â

                As I said, I haven't noticed a single false positive in the mail it's trapped for me. If it possible that different ISPs use BrightMail's info in different ways? Is it too late to ask if anyone knows any more about this?

    • Re:I say... (Score:3, Informative)

      Congratulations! You have just re-invented SPEWS (spews.org).
      • Re:I say... (Score:3, Informative)

        Huh?

        No, spews is only based on reports to a news group and some unknown persons responses to those reports.

        Talk about false positives. When you block entire class C networks, you are going to get false positives. I can find a network listed with them, and send email to from a machine on that network (that has NEVER sent spam before) and spews will block it. Was my email spam? NO, therefore it's a false positive.

        Plus when it takes over 6 months to get a network removed (if not longer), it is just about
        • "I can find a network listed with them, and send email to from a machine on that network (that has NEVER sent spam before) and spews will block it."

          Mail may be blocked by ISPs referring to the SPEWS listings and deciding whether to let an email pass or not. But SPEWS itself does nothing to the stream of email. SPEWS does not block anything.

          SPEWS publishes a listing of IP addresses that have been used to send spam to its bait addresses, and IP addresses of spam-friendly ISPs. If someone spams SPEWS, t

    • Isn't that why there are newsgroups like alt.sex.unix? A newsreader could assume anything posted there is spam and filter similar messages from other groups.

      What, that isn't the reason?

      *ahem* I'll be going now...
    • by The Monster ( 227884 ) on Saturday June 21, 2003 @11:40PM (#6265091) Homepage
      1. Set up one or more machine names on your domain specifically for spam traps.
      2. All email addresses on your page are munged thusly: When a computer at 123.45.67.89 requests a page containing the email address
        Dr. John Q. Doe <john.doe@isp.com>
        it becomes
        Dr. John Q. Doe (john DOT doe A-T isp DOT com) <16552.IP.123.45.67.89@spamtrap.domain.org >
        where the exact formula should be a bit vague, so as not to be easily defeated by bots, but obvious to humans
      3. The email server for spamtrap.domain.org is Teergrube (tarpit) that locks up the spamming computer AND sends notification back to the web site to serve that IP links to a world-wide tarpit ring, so as to get the spammers as many tarpit email addresses as possible
    • Like razor [sourceforge.net]?
  • by Tuxinatorium ( 463682 ) on Saturday June 21, 2003 @05:52PM (#6263593) Homepage
    Unfortunately, there is still no law against email harvesting, so there is nothing you can do to them unless you want a little vigilante justice.
    • Include a notice on the page that prohibits harvesters from using your page, then sue them for license violations.

      Jason
      ProfQuotes [profquotes.com]
      • by Anonymous Coward
        Nah, just put up a WebPoison page and spoil their ill gotten gains by fooling the harvesters into grabbing lots of apparently valid (tho very fake) email addresses. If enough of their customers get pissed for being sold bad email lists, eventually the problem will be lessened. http://www.monkeys.com/wpoison/ "So the basic idea behind Wpoison is to trap unwary and badly engineered address harvesting web crawlers, and to fool them into adding enormous quantities of completely bogus e-mail addresses to the
        • WebPoison has been around for a while, so I wouldn't be surprised if spamware can detect and filter wpoison pages. (Barring a wpoison tweak to fool that spamware, followed by a tweak of the spamware, etc.)
        • I have a similar system, but the problem is that it's very easy to filter for "only list email addresses on the same domain as the website", which would invalidate most of the hollings@senate.gov entries.

          And there's little point listing fake addresses on your own domain, because your mailserver still has to handle them.

          It might be worth using an email address on another domain if you have such a system, which should get your email address filtered out by spammers.

          It's also worth considering people who fi
      • I doubt such a notice has any legal bearing, not to mention that you have to show actual damages as a consequence of their action.
    • Come on, you can't have it both ways. You're either pro government control or against it, you can't say "these people can't have freedom because i don't like them, but don't take away my freedom because people don't like me"

      -Jon
      • by gad_zuki! ( 70830 ) * on Saturday June 21, 2003 @06:37PM (#6263758)
        > Come on, you can't have it both ways.
        > You're either pro government control or against it,

        Why not?

        Things are rarely polar opposites. You can't just say, "Well kid, are you a communist or for a lassiez-fair market." There's tons of middle ground.
        The formal name for this is the False Dichotomy. [info-pollution.com] More [ucsd.edu]
        Extremes only really exist as abstract concepts.

        Advocating regulation or laws to protect against abuse is hardly pro-DMCA.
        • "It only takes 20 years for a liberal to become a conservative without changing a single idea." Robert Anton Wilson

          I think robert has it backwards.. the quote implies that the world is slowly drifting to the left. In reality, it is drifting right(at least in america). if you could go backwards in time, he might have a point.

          But then you have to take apart what liberal and conservative really mean. Some would say it means large vs small government. Others think it means government protection of human right
      • ummm actually. we can have it both ways. thats the beauty of democracy.

        the dmca can prevent fair use. a right most people believe they have.

        most spammers are committing fraud. a crime most people believe should be punished.

        sometimes I really wonder how people like you get along in the world. I highly doubt you are *one way or the other* in life. No one is.
      • Freedom can be taken away based on immoral, anti-social or illegal behavior. We just define what behaviour we find unacceptable, outlaw it and we are perfectly capable of selectively removing freedom. Not all behaviors are equal and some are downright detrimental to the common good. Let's not hogtie ourselves because we are unable to exercise any kind of rational discernment.
    • While there's no way to pursue email harvesters through legal channels, there's other ways this technique is useful.

      In the example given, the spam harvester used a unique User-Agent string and a constant IP address for spidering. As a web site owner, you could block requests based on either of those credentials. In addition, you can publish your findings so that other web sites and networks can block the harvesters you find too.

      You can also complain to the harvester's ISP. Since spam is often sent with
    • In many places, spam itself isn't illegal either, however most ISPs worth their salt will glady rid themselves of customers that spam, and I would imagine they would be willing to do the same to harvesters. Of course, this isn't as cut&dry as reporting a spam, since you won't have the ever-so-informative email headers to provide evidence, but if enough individual reports come in, it would probably be effective.

      -Restil
    • True, however no website has a guarantee that you have to serve them with any data or pages.
    • You could bite back. Instead of trying to track them how about including the email address of the postmaster at the machine calling the page. That way when a harvester at j3rk.ugh.com calls your page it sees an address postmaster@j3rk.ugh.com. The harvester then sells his own address to the spammers. Then sit back and hope that the harvester decides to try to grow his organ enough that he doesn't need to do this stuff....
      • postmaster@j3rk.ugh.com doesn't really care.

        If, perchance, it is a company that makes its bread and butter collecting and selling e-mail addresses to the gullible, they probably already KNOW what they are doing, and you reminding them does nothing but give you a warm feeling.

        Another option is some retail user - there probably is no postmaster@CPE0080c6ef6343-CM0143000000054.cpe.ne t .cable.rogers.com just to pull a random IP address out of my log file.

        And finally the last case -- you hit the 'jackpot'

  • I did something similar for a while but stopped because I didn't really have any use for it. Using primarily my ISP's mail service theres not much I can do to customise it. At some point I intend to set up some sort of thing that feeds into a dns blacklist, but when that will be I just don't know. Its probably already been done, but heck, its the taking part that counts. Or something like that
  • Nothing new (Score:5, Informative)

    by Rosco P. Coltrane ( 209368 ) on Saturday June 21, 2003 @05:55PM (#6263609)
    Lots of people, including me, use different middle names or initials when applying for something in writing, by snail mail or by telephone. When junk mail comes back in the mailbox, it's easy to know what company sold your information to whom, or at least which company was the initial recipient of the bogus info and which was the last.

    Old new ...
    • Lots of people, including me, use different middle names or initials when applying for something in writing, by snail mail or by telephone. When junk mail comes back in the mailbox, it's easy to know what company sold your information to whom, or at least which company was the initial recipient of the bogus info and which was the last.

      Not quite, I do the same thing, but you still end up with a lot of spam on the e-mail addresses you publish on your web-page, and you do not change these every day by hand.

    • Whene I register for stuff online, I often use email addresses like sales@127.0.0.1

    • Re:Nothing new (Score:3, Interesting)

      by Technician ( 215283 )
      It's been a few years ago, but I had a typo on my car registration and title. I was going to get it fixed, but within 2 days of my regestration, I got mail with the same wrong name. Then I started getting sales calls. I never fixed the registration. My vehicle registration was good for about 1/3 of my snail mail junk.

      It came from places you wouldn't expect it. Sideing salesmen were the worst. I was renting an apartment at the time.
  • wpoison (Score:5, Informative)

    by Gothmolly ( 148874 ) on Saturday June 21, 2003 @05:56PM (#6263611)
    Try wpoision [monkeys.com], it's a CGI script to generate a random set of email address, infinitely deep. Very fun.
    • Re: wpoison (Score:5, Funny)

      by Black Parrot ( 19622 ) on Saturday June 21, 2003 @06:05PM (#6263649)


      > Try wpoision, it's a CGI script to generate a random set of email address, infinitely deep. Very fun.

      I'm trying to invent an e-mail address that explodes if anyone tries to use it.

      • A friend of mine thought of something similar once -- what if you had two email addresses that had forwarding rules set to each other? In other words:

        joe@abc.com auto-forwards all incoming email to joe@xyz.com
        joe@xyz.com auto-forwards all incoming email to joe@abc.com

        It's the classic "10 GOTO 20; 20 GOTO 10", but with email accounts. Has anyone out there tried this?
        • You get a mail loop. You don't want that to happen.
        • IIRC, theres some sort of mechanism that will stop mail if it is relayed more than x times. Maybe someone can expand on that for ya. I'd look it up in the rfc, but i'm busy eating a hamburger.
        • That doesn't hurt the spammer any. The only people it hurts are the owners of abc.com and xyz.com, who have to perpetually send email between their servers until one of them decides to give up for whatever reason. The spammer doesn't have to see any of it. Forwarding is done without any response or notification to the original sender, so the spammer just sees it as another email that got delivered and doesn't have to worry about the bandwidth that's being generated on innocent servers that's not even harmin
      • I know this was modded funny, but since it's possible to write an email address in a form that stops Qmail delivering it, it may well be possible to generate one that causes a buffer overflow or other problem in sendmail/exchange/other mail app.

        It would be rather amusing to r00t a bunch of dirty spammers via this technique. Use their boxes to grab kiddie pr0n from all over the net and then tip off the feds or something.
      • I'm trying to invent an e-mail address that explodes if anyone tries to use it.

        I certianly wouldn't want to be near my mail server when a spammer strikes...

        Though, suddently, I can't help but think a certian Utah congrescritter might be able to help you :)

    • Re:wpoison (Score:3, Interesting)

      by yog ( 19073 )
      great idea; I have a static page with thousands of random email addresses generated by this Perl script [tastysoftware.com], but this wpoison is sweet; the pages seem genuine and it would keep a robot busy for a long time.

      I'd like to see millions of web sites adopt this approach; then perhaps spammers would be overwhelmed by bogus email addresses and it would cost them more money to figure out ways around it, if it's even possible.

      The principle is similar to the Nigerian spam baiting [terrytraub.org] that some of us engage in; if thousan
      • Catch Bad Bots in a Bot Trap [kloth.net]

        You put a line in your robots.txt saying that bots are not allowed to access a certain directory or file. Then you put an invisible link to said directory or file on your home page. Any host that makes a request for the forbidden file is an evil bot, and gets blacklisted and/or reported to some other authority.
  • by Anonymous Coward on Saturday June 21, 2003 @05:56PM (#6263615)
    Last line of the article:

    title edit (6/19, 6:47am): Honeypot not "honey hole." Thanks, Cory.

    What's the difference between the two? Computer geeks have experience with honeypots!
  • What is it? Do you politely ask the spammers / bots to stop? Why should they. You have a server, they are looking for information.

    • ........use the fact that they gathered information from the server to get the IP addresses they searched from blackholed.

      The fact that there is no law against you collecting data does not mean that the people providing that data can't use the fact that you collected that data to prevent you from sending large volumes of e-mail to them.

      Likewise this will rapidly identify open-proxy sources that may also be used to send spam at another time.

      -Rusty
    • That's easy. I firewall them against all incoming traffic. No more spam from them, and frankly I don't care if the originator (even if innocent) suffers. If I happen to supply something they want, they can fix their damn IT systems before they get back online to me :-)

      What they're doing is not illegal, but neither is what I'm doing...

      Simon
    • If they are misbehaving bots (feed them a robots.txt too), just block their IPs and don't bother being polite. (Or feed them wpoison.)
    • They are not simply looking for information. They are mining websites in order to find email addresses to send promotional offers to.

      This is analogous to a junk mailer going down to city hall and getting a list of physical addresses to which to send his promotional material.

      There are some important differences:

      1. City hall generally will not give up the names and addresses of it's citizens to just anybody.

      2. It's illegal to send unrequested solicitations for pornography, specious medical programs, and m
  • by brejc8 ( 223089 ) * on Saturday June 21, 2003 @05:59PM (#6263628) Homepage Journal
    I am plesently suprised that my anti-spam encoded email address still has not been spammed. And even a recent spam study found that only normal email addresses got spam.
    It wouldnt take much to find and decode most of the simple spam-protected email addresses. And I dont think it would take long for the spammers to detect a system such as this and bypass it, but I dont think they will bother at the current climate.
    But pretty soon I suspect we will get much cleverer email collecting tools and the problem is going to get to the scale of the virus/anti-virus stage.
    • by Black Parrot ( 19622 ) on Saturday June 21, 2003 @06:09PM (#6263660)


      > I am plesently suprised that my anti-spam encoded email address still has not been spammed. [...] It wouldnt take much to find and decode most of the simple spam-protected email addresses. [...] But pretty soon I suspect we will get much cleverer email collecting tools and the problem is going to get to the scale of the virus/anti-virus stage.

      Then we'll start putting "nospam" in our real addresses!

      • by mistered ( 28404 ) on Saturday June 21, 2003 @09:29PM (#6264558)
        Then we'll start putting "nospam" in our real addresses!

        I do. I use myid-nospam@my_domain.org for news groups, dubious web site forms, etc. In several years, I've received exactly one spam at that account. It looks like many of the harvesters remove any address with "spam" in it, because they think it's likely fake (or at least harvester-proofed).

        By far most of my spam comes to my old eBay account. Luckily that was myid-ebay@my_domain.org, which will soon be removed in favour of a slightly different permutation.

        • "Luckily that was myid-ebay@my_domain.org, which will soon be removed in favour of a slightly different permutation."

          Okay, serious questions from folks at work then:

          If you have x users with a firstname.lastname@domain email address each, is it possible to setup a mailserver such that firstname.lastname.*@domain reaches each person's mailboox, * being a wildcard?

          I know this is possible using the 'default' account and filtering: I do this myself, but we'd need to integrate it into a 'proper' email server,
          • It's definately possible. I know some people who use this functionality to run small personal mailing lists, by having e.g. their_id-their_list@whatever delivered to their own mailbox. From there they set up the mail to be resent to everyone on their list.

            That being said, I haven't done it myself; I just have tons of entries in /etc/aliases. I'm willing to bet, though, that some Google searching will turn up more information. I'll also bet that it'll be difficult to impossible with Exchange.

          • Typically this is set up as

            userid+parameter@foo.com

            The exim.conf has a few lines you can uncomment to get it so that this will work.

            The reason I don't do this is that I don't know how to block a specific extention. I was using jl-ng@ for newsgroups (so that I could get email replies) and once it was getting to be too much, I changed the alias to a nonaccount so that it would bounce.
    • I have a separate email address on some of my business cards for a non-computer business. Within a few weeks of handing them out I started getting spam to the address. Since this email addy doesn't appear anywhere online on either my webpage or in any document, it must have been input from someone else. Maybe someone's digital Address Book got pilfered or whatever... I suspect some Outlook virus that harvested my address from someone's vulnerable computer. Point is that even a non-digital address is not saf
    • Quoting an article in soc.motss [google.com]: (from April 28th)

      Lars Magne Ingebrigtsen <larzi@nospamgnus.org> writes:

      > But just to test out that theory, this message has the address
      > larzi@nospamgnus.org. If I get mail to that address without the
      > spammy bits, then spammers have, indeed, grown brains.

      Stop the presses! I just got a spam to that despammed address.

      "You WILL make $7,500/month or it's FREE!"

      They've apparently been growing brains. Probably hydroponically.

      So some spammers have figur

  • A new RBL? (Score:4, Interesting)

    by astrashe ( 7452 ) * on Saturday June 21, 2003 @06:01PM (#6263634) Journal
    I wonder if maybe someone could create a network of honeypots, and feed the data into a database that could be accessed in real time by web servers, to deny access.

    It would probably impose too much of a performance hit for a popular site, but maybe for smaller stuff -- your bio page, or whatever -- it would be appropriate.
  • by anubi ( 640541 ) on Saturday June 21, 2003 @06:06PM (#6263653) Journal
    Its been my experience that even though you find out which IP the harvesting spider operated from, they only sell their harvested stuff to mass marketers, which proceed through several layers of people before ending up in the hands of those doing the mass mailings.

    These guys come like a thief in the night. They load your page like any other search engine spider. Its like knowing the face of the guy who went through your neighborhood, trying every door knob in the guise of distributing an advertising flyer, then later he disclosed to other thieves, unknown to you, whose at home during the day and who is not.

    Yes, its helpful in building a case, like knowing who is going through a neighborhood trying all the doors, but catching the actual guy in the act is not as easy.

    Some of this spam is really getting nasty. Just two days ago, I received this spam in my box purporting to be from the fraud department of Best Buy regarding CD players some guy in New York is trying to buy with my credit card. It seemed a really professional email, except they didn't know my name, and apparently had to get my email addy from a national credit bureau agency. When the links did not point as shown, I really became leery. The whole thing was apparently a ruse to get me to log into their site and disclose all sorts of personal information, playing on my fear that if I did not do so, the fraudulent transaction would complete.

    Watch out, guys. There's a lot of deception going on out there.

    Any tools and techniques we make to help us find out who these little rascals are is really welcome. Being some students just got nailed for their life savings for just their involvement in sharing a few songs, I trust this same environment can be used for those involved in internet scams which often cost not just a few record sales, but often substantial, I mean really substantial, grief for the victim.

  • Easily defeated (Score:2, Interesting)

    by BuilderBob ( 661749 )

    Surely the email harvester will just 'learn' to remove it's own IP number and possibly a date (or even better, just increment the IP number date to generate an infinite number of email addresses)

    A more advanced method would probably hash the ip with the date in a non-obvious way, but it'd have to be a one-to-one mapping of IP's at least and a two way hash to retreive the IP number.

    Even storing the IP number as the apache-log line (if that's possible) would work, but real addresses would always work bett

    • Re:Easily defeated (Score:3, Informative)

      by DMDx86 ( 17373 )
      I've had problems with Cyveilance and my domains. I have a few domains that I dont use anymore, but they still point to my servers, though they dont have any records in my DNS servers.

      Their robots tried to crawl those domains - they kept on querying my DNS servers for about 10 minutes straight even though there was no record for that domain on my DNS
      • Store /usr/dict/words in a database, or find a list of names on the web you can use
      • Get two words, separate them with a "." and call that the reference tag
      • Store the IP and the reference tag in the DB
      • Write the mailto: url using the reference tag as the name part of the email address.

      The downside is 2 selects and an insert on a DB for every page, but most sites are database-driven now anyway, and those that aren't probably don't care about the delay...

      As for getting the spammers not the harvesters,

  • by Anonymous Coward
    And also not require register_globals be on (better for security if you can set it to "off"):

    <a href="mailto:<?php echo $_SERVER['REMOTE_ADDR'],'_on_',date('y_m_j_Gi'),'@ EXAMPLE.COM'; ?>" title="Go ahead, Spam me">Here is my email address</a>

    (Slashdot adds an extra space before example.com)
  • fighting spam (Score:5, Interesting)

    by daserver ( 524964 ) on Saturday June 21, 2003 @06:18PM (#6263688) Homepage
    The only email address I have on my site is blockme@mydomain and if anyone sends an email to that one they get blacklisted. Easy but effective.
    • I guess you have already blocked me then, even though I've never sent spam. Someone else however has sent SPAM using my name, something I don't find out about until I get bounce messages. I know that I'm not the only person to be victom of this.

      • Re:fighting spam (Score:3, Interesting)

        by leeward ( 313589 )

        Generally blocking is done by IP address, not email address. So when the OP receives a spam addressed to blockme, I assume his software adds the source IP address the email came from to his blocklist. So you are not blocked.

  • by wheany ( 460585 ) <wheany+sd@iki.fi> on Saturday June 21, 2003 @06:26PM (#6263717) Homepage Journal
    You can often do this even without a throwaway domain. Many addresses can be tagged by adding a "+" (plus-sign) and anything between the user name and the @-sign.

    For example wheany+sd@iki.fi, wheany+SpamTastesGood@iki.fi, wheany+glahglahglag@iki.fi, wheany+spammer.com_on_06_22_2003@iki.fi all go to the same mailbox.
    • Many addresses can be tagged by adding a "+" (plus-sign)

      A startling number of sites (eBay is one, or was last I checked) refuse addresses formatted like this. Sanity-checking run amok, I assume. I've occasionally emailed site admins to point out that they're rejecting RFC-valid addresses, and the answer is invariably "Just set up a throwaway yahoo account to register then."

      (My answer to *that* is invariably "Your site's not worth the trouble.")
  • by NewtonsLaw ( 409638 ) on Saturday June 21, 2003 @06:42PM (#6263771)
    Why bother with honeypots when a Payback Page [aardvark.co.nz] is far more satisfying :-)
    • Just wait til some spammer forges *your* address in the From field, and some payback parser picks it up and adds it to a poison-email page...

  • by darkpurpleblob ( 180550 ) on Saturday June 21, 2003 @06:54PM (#6263821)
    It wasn't Mark Pilgrim that described a simple way to identify email-harvesters. The link [diveintomark.org] shows it was George A. Theall in a comment on Mark Pilgrim's weblog.

    How Cheese Man got mixed up is beyond me, as comment by George A. Theall is clearly displayed at the bottom of the comment.

  • by Hollinger ( 16202 ) <michael@@@hollinger...net> on Saturday June 21, 2003 @08:01PM (#6264126) Homepage Journal
    You should do what I do, and set up a "tar pit" on your website, with a bunch of bogus randomly generated e-mail addresses, and links back to itself. On last count, I've handed out over 100,000 false e-mail addresses.
  • mod_spam_die (Score:5, Informative)

    by c_g_hills ( 110430 ) <chaz.chaz6@com> on Saturday June 21, 2003 @08:01PM (#6264130) Homepage Journal
    Another tool to throw a spanner in the works for spammers is mod_spam_die [sourceforge.net] for Apache. It generates a random page with recursive links and fake addresses, thus causing the spammer's database to fill up with useless addresses. There's an example at chaz6.com/spam_die [chaz6.com].
  • This is beautiful. And all the other suggestions bring joy to my heart!

    I just wish someone would invent a way that sends a 100,000 volt/amp jolt back to the spammers so that all that's left to be found is a pile of smoking ashes where they were sitting when they went to check their in box...

  • That's a great idea.

    If I ever turn to the dark side and support spam, I'll have to modify my email harvester to discard those. I actually only spent a few hours working on it, but it overcomes some email protection techniques by using a real browser to load the pages (minus images & such), allowing any email descrambling scripts to run. A way to improve it might be to have it "click" all the javascript links on the page, catching attempts to browse to an email link but not actually allowing the browser
  • by ewhac ( 5844 ) on Saturday June 21, 2003 @09:12PM (#6264473) Homepage Journal

    So what happens under this scheme when a harvester bounces all their page requests through an open proxy? Does the recorded IP address mis-identify the proxy as the harvester?

    I have Zope running on an unpublished IP address and port on one of my machines. About once a day, someone tries to reflect a connection through it, like so:

    66.118.187.8 - Anonymous [30/May/2003:09:10:05 -0700] "CONNECT 64.12.136.89:25 HTTP/1.0" 404 264 "" ""

    Apparently there are enough mis-configured Web proxies out there (like older RedHats running Squid) to make this type of probing worthwhile. Does this honeypot account for this?

    Schwab

  • Better PHP code (Score:5, Interesting)

    by Sanity ( 1431 ) * on Saturday June 21, 2003 @09:45PM (#6264623) Homepage Journal
    Here is some PHP code that will do something similar - it just encodes the IP address, but it does so much more efficiently - resulting in email addresses as short as "fwAAAQ@blah.com". The fwAAAQ can then be decoded using base64_decode to get back to the original IP address.

    $remaddr = $_SERVER["REMOTE_ADDR"];
    $ips = explode(".", $remaddr);
    $bst = "";
    foreach($ips as $b) {
    $bst = $bst . chr(intval($b));
    }
    $out = str_replace("=", "", base64_encode($bst));

    echo("<a href=\"mailto:$out@blah.com\">email me!</a>");
  • <?

    // spam bait with host signature by sonny w.
    // use freely

    // this creates dummy email address with IP
    // of email harvester, but it is less obvious
    // than some examples posted earlier.

    define( "_SPAM_SIGNATURE","goatse"); // custom prefix (for your mail filter)
    define( "_MAIL_HOST","mydomain.com"); // your mail honeypot domain
    define( "_SPAM_OFFSET",131435); // whatever you like

    function SpamCode($IPquad)
    {
    if (ereg("([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})\. ( [0-9]{1,3})", $IPquad, $result))
    {
    $My
  • by jroysdon ( 201893 ) on Sunday June 22, 2003 @02:35AM (#6265724)
    Only just today I posted this article [artoo.net] about how not to get spam for users of my servers. When 97% of all spam emails within a 6 month period come from website-harvested addresses, it's pretty clear that posting your email address on a website is just plain stupid. Use a form to allow users to contact you [roysdon.net], but never allow them to be able to get your address.
  • This is easy enough to do. Check out my top level index (the one above this article) -- there's an email address there that delivers, and adds the delivering server to my local blacklist. It contains the harvester's (or other visitor's) email address, cheesily encoded.

    Ya know what I've found? The harvester bots are almost all running on cable modems. They use them for a while, then throw them away. And they rarely, very rarely, send spam from the same host that's out harvesting. In my experience, the harve
  • Not Mark (Score:3, Insightful)

    by dorward ( 129628 ) on Sunday June 22, 2003 @08:14AM (#6266421) Homepage Journal

    Mark Pilgrim describes...

    No he doesn't, George A. Theall does, in a comment attached to an article by Mark.

  • by kasperd ( 592156 ) on Sunday June 22, 2003 @11:20AM (#6267170) Homepage Journal
    I did a few small honeypots for the spammers to play with. SMTP [daimi.au.dk] and proxy [daimi.au.dk].

"The medium is the massage." -- Crazy Nigel

Working...