Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
The Internet Your Rights Online

Amazon Bots Cause Grief For Associate Web Sites 136

theodp writes "Amazon Associates and Web Services developers are crying foul over the hammering they're taking from ill-behaved bots that Amazon had subsidiary Alexa Internet dispatch to evaluate the 'quality and reliability' of their sites. Amazon fessed up and acknowledged problems exist, but points to recent Operating Agreement changes that not only give Amazon and any of its corporate affiliates the right to do so, but also to use unstated technical means to overcome any methods that are used to try to block or interfere with such crawling or monitoring. Interesting stance from the folks who called on the Senate to prosecute those who degrade the technical quality of service at web sites."
This discussion has been archived. No new comments can be posted.

Amazon Bots Cause Grief For Associate Web Sites

Comments Filter:
  • by Jan0815 ( 18673 ) on Monday December 09, 2002 @06:52AM (#4842905)
    I am not able to view any of the mentioned links. Keeps on redirecting between login and some other page.

    Funny to see that someone complaining about abuse links to pages that do not work with Webwasher filtering.
    • could be (Score:2, Interesting)

      by hairmare ( 577156 )
      .. something about not accepting any cookies? cookie filtering is just great ;)
      • Re:could be (Score:4, Insightful)

        by Jan0815 ( 18673 ) on Monday December 09, 2002 @08:09AM (#4843039)
        Hehe. In fact I am filtering cookies, scripts, popups, referrer, webbugs etc.

        So I guess I am not very informative about my habits - which I think is my freedom to do. And if a site doesn't work that way, the site owners clearly indicate that they are not willing to accept me a s a visitor - which is their freedom.

        At least /. works well that way ;-)
        • So I guess I am not very informative about my habits - which I think is my freedom to do. And if a site doesn't work that way, the site owners clearly indicate that they are not willing to accept me a s a visitor - which is their freedom.

          Your logic is wrong.

          The owners of a website, especially if they are not aware of your habits, are not rejecting ('not accepting') you as a visitor / customer.

          At the worst, they're not taking efforts to accomodate your nonstandard way of browsing the web. YOU were the one who chose to apply filters--hence, the active part in the exchange is you, not the website owner.
          • My logic is wrong?

            Explain to me, when exactly became cookies something I MUST enable? When became Web-bugs standardized? At which it was ridiculed to have no referrer? And exactly when was it agreed that JavaScript is part of the HTML/XHTML/HTTP standard?

            Exactly what is non-standard in my way of browsing the web? If you mean unusual I could agree, but non-standard is wrong.

            • Re:could be (Score:2, Insightful)

              Exactly what is non-standard in my way of browsing the web? If you mean unusual I could agree, but non-standard is wrong.

              You're confusing technical uses of the word with colloquial ones. Web standards have _never_ been the normal case; there has always been some tweak or extension that makes the web useable in ways that a significant proportion of the decision-makers seem to like.

              Your adherence to Standards is a non-standard act (an act against the norm, unusual, et al), and as such it is an unforseen action on your part and not an action on the part of the website owner & their developers to exclude you.

              My point wasn't about standards, it was about whose action caused you to be unable to use the non-standard commerical websites. Since non-standard design is the norm for the industry, it's a case of the website failing to take extraordinary action (making their sites standards-compliant) to keep you as a visitor, and not them taking action to deny you as a visitor.

              At best, they're ignorant and you're suffering from the consequences of your choice. At worst, they're gulity of not wanting to expend the effort to accomodate you--but that only happens if they have the ability & opportunity to meet your needs. (i.e., only to those websites that you contact with a request for a toned-down main or alternate version of their web page that you can visit.)
              • a request for a toned-down main or alternate version

                LOL. A website that simply works is "toned down"?

                I can tell you how many times I've come across website that DOES work perfectly, except they ADDED code to redirect you to a basicly blank page saying "this site requires Internet Explorer/cookies/javascript/whatever". I have sometimes disabled this and used the site.

                My point wasn't about standards, it was about whose action caused you to be unable to use the non-standard commerical websites.

                In many cases they take a perfectly good website and disable it with these tests. Having cookies/javascript/whatever off may cause specific features to not work, but you actually have to go out of your way to design the entire site to fail.

                If I have cookies off and it doesn't remember my prefferences, fine. If I have javascript off and help windows don't work, or the formatting is lousy, fine. But to fail to display ordinary text? That nearly has to be intentional, even gross incompetence is usually partially functional.

                -
  • Boohoo!! (Score:5, Interesting)

    by Ratface ( 21117 ) on Monday December 09, 2002 @06:56AM (#4842911) Homepage Journal
    Given that many people still boycott Amazon [noamazon.com] for their stance on software patents, I guess that they won't be shedding many tears.

    One could argue something about watching out for who your bed-partners are! Bear in mind that a company that has such a disregard for even their affiliates has to have a pretty poor respect for anyone else out there! Caveat emptor!

  • by Deal-a-Neil ( 166508 ) on Monday December 09, 2002 @06:59AM (#4842918) Homepage Journal
    We've noticed quite a few requests for robots.txt by the Alexa archiver. So a suggestion to boot may be throwing this into your root directory of your domain's web site (in a file called robots.txt)

    User-agent: ia_archiver
    Disallow: /

    And if its really annoying, bloody hell, just do an active firewall block and put the sharks (lawyers) away with those goofy lawsuits before they start wasting our senators' time and taxpayer cash.
  • by dagg ( 153577 ) on Monday December 09, 2002 @07:05AM (#4842928) Journal
    I'm an Amazon associate, and I've been following this problem. Amazon's web-bots are looking for outdated links to books that don't exist, etc. The reasoning is that if the associate fixes the dead-links, then Amazon (and the associate) will presumably make more money.

    The problem is that the bots are way too diligent. They go to every single link on every single page, even if the page is dynamically generated. Many sites have an infinite combination of url's, and as a result, the bots sit on them trying to download every single variation of query. That means that Joe Amazon Associate's web site is hammered with requests and his bandwidth fees go through the roof.

    The simple solution would be to just stop Amazon from sucking up the bandwidth via a robots.txt file. But Amazon says that is not allowed. There's the dilemma.

    Amazon.com has been silent on this issue for the last several days. My bet is that the bots won't come back without some heavy-duty tuning.

    --Your Sex [tilegarden.com]

    • most bots have this problem when they're initially made.

      Remember when you could boost your ratings on Google but trapping the bots?
      • And for the Amazon associate who is really smart, that would be a way out of the problem right now.

        You just put in an invisible link at the top of your your page and assume anything that follows it must be a robot. But instead of trapping it, you switch to make all your normal links go nowhere.

        Voilla! No more robot.
    • Strange thing is I am not an Amazon associate and I get those bots.
    • Amazon in general seems as nutty as a fruitcake. I regularly get email messages notifying me that shipment of items has been delayed for stuff I've already received! HELLO! Today I'm informed that a DVD is "beckoning" me from my wish list when I already purchased it and watched it 2 weeks ago. HELLO! HELLO! Somebody over there needs to get the collective act together.
    • by Mike1024 ( 184871 ) on Monday December 09, 2002 @03:12PM (#4845511)
      Hey,

      Amazon's web-bots are looking for outdated links to books that don't exist, etc.

      Wouldn't a better solution be to modify the software at amazon.com, so that every time there was a book not found/out of date error, it logged the refering affiliate and HTTP_REFERER request header?

      I can't see why they would need bots and suchlike for such a simple procedure...

      Just my $0.02,

      Michael
      • The problem with that is that both amazon and the referrer lose money on the broken link.

        What would be a better idea is that amazon should keep a db of referrers and books they link to. When a book goes out of print, it sends off an auto email to the referrers of the book, letting them know they have a dead link.
  • Just block it? (Score:2, Interesting)

    by jeroenb ( 125404 )
    The Associates Operating Agreement [amazon.com] states:
    Therefore, you agree that we and our corporate affiliates may take such actions and that you will not seek to block or otherwise interfere with such crawling or monitoring (and that we and our corporate affiliates may use technical means to overcome any methods used on your site to block or interfere with such crawling or monitoring).

    As such, it doesn't say that you agree not to block them or that you're violating their license if you do block them. All you agree to is that they can monitor your site, but if you don't like how they do it, it doesn't state that you have to put up with their crawler. The only thing you do agree to is that they can use "technical means to overcome" your blocking. But so what? Let them waste money on attempting to monitor your site by modifying their crawler :) Does anyone believe they'd actually do that? Most likely they'll just leave you alone.
    • Re:Just block it? (Score:1, Informative)

      by Anonymous Coward
      Therefore, you agree that we and our corporate affiliates may take such actions and that you will not seek to block or otherwise interfere with such crawling or monitoring (and that we and our corporate affiliates may use technical means to overcome any methods used on your site to block or interfere with such crawling or monitoring).

      Actually, doesn't it say that you aren't allowed to block it, but if you do, they can try and get around it?
      • we ignore /robots.txt and we'll circumvent every actions you take to not let us crawl your cgi-bins

        kneel down and I'll spank you, associates...?
    • Re:Just block it? (Score:5, Informative)

      by Anonymous Coward on Monday December 09, 2002 @07:40AM (#4842994)
      That is so blatantly wrong how can it be modded up to 4?!

      It says exactly that you agree not to block them!

      "you agree ... and that you will not seek to block or otherwise interfere with such crawling or monitoring"
      • Clarification (Score:2, Interesting)

        by jeroenb ( 125404 )
        If you agree not to block or interfere with crawling or monitoring, you're not telling them they can do whatever they want. You agree they can crawl and/or monitor your site, but not doing that in any way *they* want to.

        It's OK if they crawl/monitor my site using a bunch of people surfing my site all day long. I won't attempt to block that. Anything else, I might.
        • Re:Clarification (Score:3, Interesting)

          by Izeickl ( 529058 )
          There is no clause in the contract saying "You will not block our crawler/monitor as long as you deem it ok", you quite simply agree to let them monitor it with no restrictions, the added clause "but not doing that in any way *they* want to" is your opinion and addition, but not actually within the contract agreed, so unless you get a private agreement or they change it themselves its not written that you have to like the way they do it.
          • Unless they state in their contract *how* they're going to crawl/monitor, I do have the right to block whatever I want without violating this as long as I don't prevent them from crawling/monitoring at all. So yeah this is a pretty useless agreement, but it's mostly very stupid instead of restrictive (although everybody seems to believe the latter.)
        • "(and that we and our corporate affiliates may use technical means to overcome any methods used on your site to block or interfere with such crawling or monitoring)."
          Depending on exactly what "technical means" means, this sounds like:
          All your sites are belong to us.

      • Except how can you agree not to block them if you're one of the vast numbers of associates who run their pages from someone elses server?


        If I was runnning a network being clobbered by Amazon I would put any barriers I felt like, such as dropping their packets and there is not a damned thing they could say or do to stop me. I'm not an associate and it's too bad for them that they can't see fit to play nice.

  • All they have to do (Score:4, Informative)

    by TerryAtWork ( 598364 ) <research@aceretail.com> on Monday December 09, 2002 @07:18AM (#4842946)
    To make this palatable is lower the request rate to something like 1 per minute.

    Most robots do something like that.

    Of course - it takes a lot longer....

    • by valisk ( 622262 ) on Monday December 09, 2002 @08:00AM (#4843022) Homepage Journal
      Thats why they should set it to max request 1 page per minute from any one site, but check out many thousands of sites during that one minute.
      Robots have been around since the web started and it suprises me that the designers of this robot havent looked at previous design and good practice.
      If any of you Alexia numbskulls happen to be reading this perhaps you could buy yourself a copy of HTTP the def. guide from O'Reilly, which has a tremendously clear explanation of what to think about to prevent your robots from destroying every site they visit that isn't sat on a T3 and Sun Fire w/ 64 CPUs and 64 GB ram.
    • i read that at first as "To make this *patentable*...
  • Just goes to show (Score:4, Insightful)

    by zachusaf ( 540628 ) <zachary.thompson@gmai l . com> on Monday December 09, 2002 @07:30AM (#4842974) Homepage
    Noting comes for free. Presumably, they are Amazon Affiliates to get a cut off a sold book. You don't get anything for free. Perhaps an opportune time to do the Barnes and Noble thing?
    • by Fly ( 18255 )
      That was the stupidest reply I've seen yet. (Probably because it's rated so highly.) The issue is not that Amazon is requiring something in return from their affiliates, but that they're inadvertently destroying their affiliates with a broken web-spider design.
  • by nuggz ( 69912 ) on Monday December 09, 2002 @07:43AM (#4843001) Homepage
    It looks like these people are signing agreements they didn't read or understand.

    They have a few options that I can see.

    Terminate the agreement.
    Bill for the bandwidth, or sue for damages.
    Various technical measures (which are prohibited by the agreement)
    Point out to your contacts at Amazon that this is pointless and dumb in such a manner they actually listen.

    Make a mini site for the amazon site/bot, but the rest of your website in a second location (that they don't have access too)

    Why deal with a company like this anyway, they're obviously inconsiderate pricks (at least) move on with your life.
    • They have a few options that I can see.

      Terminate the agreement.
      Bill for the bandwidth, or sue for damages.
      Various technical measures (which are prohibited by the agreement)
      Point out to your contacts at Amazon that this is pointless and dumb in such a manner they actually listen.

      Here's an idea.... How about politely posting a question or two about it in the appropriate forums? Who knows, something crazy might happen, like responsible people at Amazon might respond and turn the bot off while they investigate. Then, they might post a reasonable explaination and take reasonable steps to make sure they're not abusing associate's servers.

      Here's another idea.... Try reading the pages that slashdot linked to. I know that's a lot of work, so I'll save you a bit of effort by posting each slashdot link, and a brief summary of what you would have found had you bothered to click on it and ACTUALLY READ it (before posting here with a subject advocating actually reading the terms and conditions).

      • Amazon Associates and Web Services developers are crying foul over the hammering they're taking [prospero.com] - Alan Richmond comments that the bot made 13406 hits in 17 hours on November 26, transfering a total of 200 megs. Many posts preceed this, and several follow it. It's all pretty level headed discussion. Many people seem to feel the bot is not designed that well and ought to be improved, but very little of it amounts to "crying foul". Even Alan says he want an explaination. Nobody is terminating their agreement, attempting to recoup significant losses, threatening to sue, advocating blocking (other than discussion of robots.txt). People in the forum are expressing their concerns "in such a manner they actually listen", which happens to be a polite, level-headed manner... which you would know of had you actually read the forum, rather than blindly posting here that the associated should read the terms and conditions before they "sign".
      • Amazon fessed up [prospero.com] - Amazon explains what they're doing, and why, and the steps they've taken to avoid abusing servers. They claim they've designed the bot to avoid accessing any server more than once every two seconds (Alan's example is 13406 hits in 17 hours, or one hit every 4.56 seconds, on average)
      • Amazon acknowledged problems exist [prospero.com] - They actually say they're investigating, and while they're investigating their bot's impact, they've taken it off-line. They also answer the question that appears frequently in the forum... the purpose of "ia_archiver" vs "amzn_assoc". It's not clear what they'll actually do, but they obviously are trying to respond to people's legitimate concerns
      • but points to recent Operating Agreement changes [prospero.com] - Yes, while Amazon appears to be taking the matter seriously, they also are making it clear that they expect to be able to verify the accuracy of links from associates. They explain the purpose in the agreement (and it's really not that unreasonable, is it?)

      This just isn't that sensational of a story. Yet another 'bot that needs some refinement, but a it IS designed to avoid more than one hit every 2 seconds (and the evidence posted seems to be consistent with that). They at least did respond to people's concerns and they took the bot off-line while they investigated it. Sounds pretty reasonable. It's not clear what might actually be done, and some of it appears that Amazon is claiming the problem isn't so great... but clearly they are attempting to respond to people's concerns.

      Amazon feels they have a right to check the links on associate sites, and they put it in the terms. Again, it's really not that unreasonable.

      What is unreasonable is the inflamatory summary appearing on the main slashdot page. Yes, timothy and other slashdot "editors" can claim it's all just editorial from "theodp" who submitted the summary. But what kind of editing it that?

      The summary concludes with:

      ... Amazon and any of its corporate affiliates the right to do so, but also to use unstated technical means to overcome any methods that are used to try to block or interfere with such crawling or monitoring. Interesting stance from the folks who called on the Senate to prosecute those who degrade the technical quality of service at web sites.

      The link is to Amazon's position on DDOS attacks... there's really no similarity to a well-intentioned 'bot, which clearly identifies itself, limits itself to 0.5 Hz access rate, AND was responsibly taken off-line and reexamined when some people complained that it used too much bandwidth.

      • I did read the links.

        Amazon released a bot that negatively affected the affiliate websites.

        This is at the very least inconsiderate.

        I posted my opinion how this or similar activities COULD be handled.

        You seem quite defensive about it, were you the one who wrote a buggy bot?
  • amazon... (Score:3, Insightful)

    by katalyst ( 618126 ) on Monday December 09, 2002 @07:44AM (#4843005) Homepage
    Seems to be going the Microsoft way. They seem to be exploiting their monopoly in their sphere of business. Their recent ploy to patent their click n buy commerce system had attracted lots of attention from the people and the OS community. Many Open-letters were exchanged. But people seemt haev already forgotten; the average human, understandably is worried only about factors that affect him, and that too, immediatly. Now this new issue....
    • Re:amazon... (Score:3, Insightful)

      Market Leader != Monopoly. Yes, Amazon is the king of online shopping sites. But Amazon is far from a monopoly. Amazon faces a good deal of competition in most markets, not only from other websites, but also from Brick & mortar stores. If you think that Amazon isn't competing with the bookstore down at your local mall, think again. Until that local bookatore closes, along with B&N.com Amazon will have competitors. Amazon is far from a monopoly - just a very successful store.
      • They might not be a monopoly, but the Canadian Postal Office mail delivery trucks have AMAZON.COM written all over them. Government contracts for cheaper shipping sounds a bit monopolistic to me.
        • Actually, it sounds to me like the Post Office is competing with UPS & Fed Ex - in order to compete, they have to be competitive. That happens by offering nice contracts for cheaper shipping. The trade off is they get LOTS of shipping.

          Why is everyone so quick to cry monopoly? I'm one of the most anticorporateist types I know, but just because a company is big, has large market share, and deals with goverment agencies (especially ones that compete directly with private industry) does not make it a monopolist.
          • I understand. That's why I say 'not a monopoly'. It just seems that the big corporations are getting bigger, and the little guys trying to scrounge a few bucks on the side are getting screwed by the companies they advertise and raise revenue for. There was a time when the internet was a free open space. There were ideas to be shared, thoughts to be provoked, and money to be made. Now there's big companies, copyrighting mouse clicks, and image dimensions, deciding where you can go to buy what you want. ISP's deciding what you can and can not see, and all the provocative thought int cyberspace has been relegated to 'nerds' at /. and extremetech. It's just a little dinenheartening that one day the internet will cease to exist as an entity of it's own. It will be an affiliate to CNN-Time-Warner-MicroZon.
        • They might not be a monopoly, but the Canadian Postal Office mail delivery trucks have AMAZON.COM written all over them. Government contracts for cheaper shipping sounds a bit monopolistic to me.

          Which in turns means cheaper stamps for us to send mail with. I dont see anything wrong with Canada Post selling otherwise useless space on it's trucks to Amazon. And the day you start shipping as much as Amazon does, don't worry. Canada Post will cut you a good deal too.

  • by jkcity ( 577735 ) on Monday December 09, 2002 @07:47AM (#4843008) Homepage
    http://forums.prosperotechnologies.com/n/mb/messag e.asp?webtag=am-associhelp&msg=2579.1&maxT=3">http ://forums.prosperotechnologies.com/n/mb/message.as p?webtag=am-associhelp&msg=2579.1&maxT=3

    ok that is a post from the associates board

    in which amazon state

    "Hello Associates.

    Thank you for providing such valuable feedback. The Alexa crawl (id amzn_assoc) has ceased while we investigate the statements made in this post. We plan to address the following concerns:

    1. The impact the crawler may have on bandwidth
    2. The number of pages the crawler hits per second
    3. How the Alexa crawler might identify and ignore AWS pages or links

    Points of clarification:

    1. Regarding Archive.org, Alexa has confirmed that material that is crawled by the 'amzn_assoc' crawler is not donated to the internet archive. It is used exclusively for the purposes of the Broken Link Reports.

    2. The Alexa crawler 'amzn_assoc' differs from the 'ia_archiver' crawler. The 'ia_archiver' can be excluded by using a robots.txt file and will not violate the Amazon.com Associates operating agreement.

    You should expect a response from us by COB Friday as it may take a few days to research your concerns. This issue is important to us and we will get it resolved. Thank you for your patience.

    The Amazon.com Associates Program"

    I participated in that conversation myself though and I don't think I seen one happy person that though making the agreement so we had to let them crawl our sites as often as they like.

    cj.com report error links but they do it from the server end, amazons system is just stupid and it was only done to try and give there alexa company some work todo.

    so I guess its just wait and see now till we know if the bot starts back up again.
  • 1984? (Score:2, Funny)

    I haven't read 1984 in a long time, but I don't remember big brother coming from the amazon.
  • by Anonymous Coward on Monday December 09, 2002 @08:00AM (#4843024)
    Interesting stance from the folks who called on the Senate to prosecute those who degrade the technical quality of service at web sites
    Whoah! That'd mean Slashdot would have every senate lawyer after it right?
  • by Anonymous Coward on Monday December 09, 2002 @08:26AM (#4843069)
    Seems every other link on the 'net is a link to some book on Amazon. All too often I'll follow an innocent looking link and find myself at Amazon yet another time.

    Reminds me of that old horror movie where they try to drive away from a haunted house, but every road they take leads them back up the driveway to the place.

  • I wonder... (Score:3, Funny)

    by Anonymous Coward on Monday December 09, 2002 @08:44AM (#4843100)
    "... called on the Senate to prosecute those who degrade the technical quality of service at web sites." Would that include the Slashdot effect?
  • by pla ( 258480 ) on Monday December 09, 2002 @09:06AM (#4843152) Journal
    Simple 'nuff...

    Just temporarily (perhaps 1 day) block ANY client's class C (not just that of Alexa's crawler) that starts generating more than X hits per second for longer than five minutes.

    By doing so, you haven't taken steps to specifically thwart *Amazon's* activity, you have simply enacted a reasonably security measure to block DOS attacks. If Amazon actually dared to sue for blocking them, you'd have a HELL of a countersuit on the grounds that their 'bot triggered your DOS alarm.

    Personally, I'd just block their bot and if they complain, tell them where they can stick their partner agreement. No self respecting online retailer needs their own "partners" degrading their QOS. Anyway, When I want to buy something, I use either Google, or a product-specific price-search engine (like PriceWatch). Amazon counts as my LAST choice for finding something (actually not quite true... If I need to use Google to find a product for sale, I often check Amazon first, just to get things like UPC or ISBN numbers to narrow my search).
    • When I want to buy something, I use either Google, or a product-specific price-search engine (like PriceWatch). Amazon counts as my LAST choice for finding something (actually not quite true... If I need to use Google to find a product for sale, I often check Amazon first, just to get things like UPC or ISBN numbers to narrow my search).

      This should be called the fundamental (slashdot) attribution error. Assuming that we are representative of the market.

      Reminds me of a VC I know. They were sitting in a conference room back in 1998 hearing a pitch from an online bill presentment company. The partner's first objection was that obviously everyone already had online banking and bill payment. To prove it, he asked everyone in the room if they had online banking. Everyone did.

      Out in the real-world, > 2% of people had online banking.

  • by shawnwe ( 632565 ) on Monday December 09, 2002 @09:35AM (#4843232)
    Instead of crawling websites, why doesn't amazon and other companies just require you to have formated index of all the links you provide on your website. Could be amazon.xml in the root. And this file could be dynamic or hand-typed...

    http://www.yourwebsite.com/amazon.xml http://www.somewebsite.com/~yoursite/amazon.xml
    • by tomblackwell ( 6196 ) on Monday December 09, 2002 @10:27AM (#4843459) Homepage
      There is no guarantee that the "formatted index of all links" is accurate, or up-to-date. Amazon wants to make sure that every single amazon affiliate link meets their criteria.

      Your solution would work only for the intelligent and diligent and lucky. There are many Amazon affiliates who are neither.
  • by Alioth ( 221270 ) <no@spam> on Monday December 09, 2002 @09:58AM (#4843319) Journal
    A while back (when I was still using a CobaltRaQ2 - adequate for the job, but not particularly speedy with cgi scripts) I got DoSSed by ia_archiver (yes, cgi-bin is in robots.txt, no I'm not associated with Amazon, but someone else who links to the cgi-script in question probably was). I thought ia_archiver was another Teleport Pro, and just modified the acutal script to display a rejection page if it saw ia_archiver in the HTTP_USER_AGENT.

    Finally, I know what it is...

    It was trying to crawl *every* available url for the CGI script - and it appeared to be buggy because it got itself into an endless loop changing from one mode to the other.
  • Is it time to add Amazon to the /etc/hosts.deny file?

    If you're a member company, employing Amazon's services, then in my opinion you should be responsible for providing Amazon with the links you want Amazon to vend, not that Amazon should crawl through your site for your pricing information...
  • You know, I find it really ironic that the page [prospero.com] where they explain about how they're looking for broken links as a link to alexa.com/associates that's broken.

    Goofballs.

  • by Flow ( 22148 ) on Monday December 09, 2002 @10:41AM (#4843539) Homepage
    If you don't like the tactics of Amazon, there are alternatives. One of the best is BookSense.com [booksense.com]. Not only do they offer an affiliate/partner program, you'll also be supporting independent bookstores (rather than the chains or Amazon):

    http://www.booksense.com/affiliate/ [booksense.com]

  • Can someone clarify? (Score:4, Interesting)

    by callipygian-showsyst ( 631222 ) on Monday December 09, 2002 @10:44AM (#4843555) Homepage
    What's the deal here? It's hard to beleive this is malicious--probably just the result of Amazon hiring the cheapest possible kids to do the Perl hacking/crawling. If they hired more experienced professionals, they may have been able to crawl their affiliated sites better.

    Amazon is crawling these sites so that they can be featured on their website. When you search for an item, Amazon lists the prices and availability from the associates--everyone wins.

    It seems that Amazon is searching a bit too often--combined with some affiliated sites that have very s-l-o-w dynamic pages, which is causing some problem. It's hardly a crime that Amazon is commiting--after all they want the most accurate, up-to-the-minute information on their website.

  • by r2ravens ( 22773 ) on Monday December 09, 2002 @10:47AM (#4843571)
    The timing of this problem is interesting. A few years back, we had the problem of the one-click patent and the fact that Amazon used it to disrupt the christmas sales of Barnes and Noble. It seems that the one-click thing became a less pressing problem on December 26. Although I can't remember the specifics of other events, it sticks in my mind that other ploys used to disrupt competitors businesses have been timed to screw with the christmas season.

    I know that the people being DOS'ed by Amazon are defined as 'affiliates', but maybe Amazon percieves 'affiliates' in the same way Microsoft percieves 'partners'; people to use and then buy or destroy. How much you wanna bet that this problem goes away after christmas? Of course, the claim will be that it was brought to their attention and it was fixed, but the timing of the whole thing is very suspicious. Perhaps this was the plan all along.

    In these days of slim margins in business, maybe Amazon figures the average internet user is smart enough to figure that it their preferred site is slow, they will go directly to Amazon for their purchase and Amazon would be able to avoid reimbursement of their 'affiliate' for the sale.

    Has this problem been going on, but been unnoticed for a while, or did it just start? I'm no consipiracy theorist, but the elements seem to be there for this to have been intentional and the timing is very suspicious. Why couldn't they have done this last month, or the month before if they're just checking for outdated links? Am I out in left field with this idea?

    Anyway... just a different perspective and some food for thought.

    • "I'm no consipiracy theorist, but..."

      You just KNOW that when someone uses that line, then you are in for a nice whacky conspiracy theory that doesn't stand up to more than half a second's scrutiny. And you just confirmed that.

      Hint - IF Amazon were deliberately DOSing a site (as opposed to simply runing a link-checking robot written by a clueless moron as is the case here), THEN the site woudl be too slow for people to even GET to the Amazon links, and thus would not think to go to Amazon directly (why woudl they go to Amazon if they don't know there is something being recommended in the first place)?

      "I'm no consipiracy theorist, but it's all a conspiracy, I tell you!"
    • I'm no consipiracy theorist, but the elements seem to be there for this to have been intentional and the timing is very suspicious.

      Like the timing of responding publicly quite promptly.

      Like the timing where they disabled the 'bot soon after some people posted concerns about it?

      If it really were some sinister plot to rob associates of their referal fees (which could be done much more easily by simply making accounting errors, Enron or RIAA style), don't you think they would have remained silent, or at least kept the 'bot running as a lengthy "though investigation" proceeded until the 26th?

  • Amazon pays so much in affiliate fees that they can have all the bandwidth they like from us ... I've seen much worse crawlers, from german search engines to broken proxies doing 10 hits/second on dynamic pages to stupid windows users who wanted to make our (very dynamic) website available for offline browsing. If you can't take a few 1000 hits/day because your CGIs are so slow, then what is your site doing on the web anyway? ;-)
  • Powells Offers More (Score:1, Interesting)

    by Anonymous Coward
    Powells Books [powells.com] offers a better associate program for web sites. Why even deal with Amazon's crap?
  • Informed View (Score:5, Informative)

    by peterdaly ( 123554 ) <{petedaly} {at} {ix.netcom.com}> on Monday December 09, 2002 @11:58AM (#4843990)
    I am an Amazon Associate who has experience with the Alexa Crawler. I believe the crawl is intended to find broken links, or links to products that are no longer stocked.

    The Amazon Associates program has been around long enough for "page rot" to kick in, and I am sure there are many sites out there with links to non-existant products, such as old editions of books, etc. Historically, associates had to build static links (for the most part) by hand, and embedded them in more or less static page.

    The problem comes in due to the recent introduction of their web services, where sites can build essentially unlimited pages based on dynamic real-time queries to amazon. I don't believe their intent is to "thrash around" in these sites, which is what is occuring.

    A few month ago, I asked to have the Alexa bot crawl my site, (StarvingMind.net [starvingmind.net]) , I was curious about the reports it was able to generate. The bot ended up in endless loops and had to be manually stopped by someone at Alexa. They spent an impressive amount of time trying to identify and fix the problem my site was creating for their bot. I don't know whether my specific problem was ever resolved, but I have the impression the bug was found and fixed. I also have the impression that the bot is very immature code and buggy.

    Based on the personal and public responses I have seen from the Amazon and Alexa people involved, they actually do care about these issues very much, and don't wish to cause harm by the bots use. I believe their goal is to eliminate the link rot that has accumulated on associate sites over the years, manytimes with the site owner unaware of the problem.

    Web services threw a curve into the mix, and that is where the major problems are occuring. The post I a replying to seems to imply Amazon may want to "use then throw out" the associates. I think that is pure speculation without any knowledge of the fact. Amazon has recently gone from what appeared to be no fulltime staff to a team of people dedicated to supporting and running the associates program. I believe they consider it a very cost effective way of advertising, and I expect it is doing quite well for them. Based on their recent actions, I believe they are trying to build a strong long term relationship with the active ones of us, as we bring them a fair amount of business.

    Another post has pointed out they have stopped the crawl while the issues talked about here are looked into. They realize they may have made a mistake, and are trying to figure out how to address the problem. They have been responsive (with me at least) resolving problems like this in the past, they deserve a chance to resolve it this time as well. They have started down the right path, by stopping the crawl.

    -Pete
    • by Anonymous Coward
      Never ascribe to intent what may be accounted for by simply rolling out premature code that has been subjected to very little test. Amazon has a bias toward making schedules at the expense of testing.
  • ...to what any sensible software engineering team would have built as a re-active solution?

    Problem:

    Some of our affiliates have out of date links.

    Dumb Solution:

    Create stupid high bandwidth consuming spider that endlessly crawls affiliate sites looking for out of date links;

    or

    Sensible Solution:

    When an out of date link comes along to the website, display an apology screen to the visitor (whilst not letting up on any other sales opportunity) and email the affiliate telling them to get their site up to date.

    Some people just don't fink.
    • In the mean time, you've just lost at least one sale per broken link. Perhaps they don't think that's acceptable?
      • In the mean time, you've just lost at least one sale per broken link. Perhaps they don't think that's acceptable?

        Amazon seems to be good at recommending items in relation with what you're searching for... Why not just force-feed another one of theses "People who searched for this item also enjoyed these (totally unrelated by the way.) items."
        That way you potentially save a sale (dont tell me that every single person who clicks on one of those amazon links actually BUYS the product.) and you manage to annoy the reader with some free ads, and potentially screw the associate out of a sale. Everyone wins. (Ok. except perhaps the associate.)

        • Just a suggestion, maybe add "Montreal" somewhere in your sig? If it's your wesite maybe enlarge the text "Montreal's rave community" on the home page, possibly in both english and french.

          Man, I live in the New York City region, but it's like outer Mongolia for rave info.

          -
          • Just a suggestion, maybe add "Montreal" somewhere in your sig? If it's your wesite maybe enlarge the text "Montreal's rave community" on the home page, possibly in both english and french. Man, I live in the New York City region, but it's like outer Mongolia for rave info.
            Really offtopic here, but since you're not listing an email addy, I cant get back to you directly.

            We're not aiming to stay local. We're really aiming to have local people help us cover their local rave scene. I'd really like to be partying in montreal, toronto, and NYC, but there's only one of ol'lil'me, and just so many hours in a week. If you want to take up the NYC section of the site, just drop me a line, and we'll make you some room. Even get on a bus and go party with you guys sometime.

  • by Anonymous Coward
    Alexa's web crawler is great from one perspective and terrible from another.

    On the great side their crawler can easily use an entire T3 with just a stock PC driving the requests.

    On the terrible side the crawler has is stateless - it has NO IDEA OF WHAT IT'S RECENTLY DONE. It doesn't know when it has hit a particular site 1M times in the last hour.

    So when they say "it only crawled each site on average every 4 seconds" that is on average. You know, take total urls divided by total time. Doesn't say anything about how hard they hit aaa.com

    The problem is that the crawler is designed in the extreme to be efficient. Keeping site stats and blocking GETs is inefficient.

    You generate a list of URLs for it to crawl. It blindly crawls this list in order. To prevent aaa.com from getting hit with the first 100k requests (assuming aaa.com has 100k urls in the list) you randomize the list before crawling.

    Problem is the randomization isn't perfect, and also any site with a high % of urls in the list is still going to get hammered.

    Now I don't know if this is the crawler Alexa used on the associates. But I wouldn't be too surprised.
    • Just a note on your comment about "it only crawled each site on average every 4 seconds" They actuall address this: "To eliminate any possibility that a particular site will be hit more then once every two seconds, our crawler records the most recent two minutes of IP addresses and calculates the frequency of a particular IP within that two minute time frame. If the IP has been hit above a programmable threshold set at two seconds, the page is sent back to the cache and not crawled until the frequency is greater then two seconds." Last time I checked, I can surf faster then that.
  • Even though I signed up for the Amazon.com associates program for Downside [downside.com], I'm not seeing any hits from strange user agents. That's a relief, because I have hundreds of links which change daily. I don't need some 'bot trying to download the entire database of financial statements for every US public company.

    Looking at user agents, the browser war is over. IE is #1, and Netscape often isn't even in the top 10; various indexer 'bots generate more traffic than Netscape.

    • Re:OK from here (Score:1, Informative)

      by Anonymous Coward
      > Looking at user agents, the browser war is over. IE is #1, and Netscape often isn't even in the top 10; various indexer 'bots generate more traffic than Netscape.

      Looking at user agents is incredibly foolish since most browsers' agent strings default to IE and most users don't change that default.
  • 21 Dog years (Score:1, Interesting)

    by Anonymous Coward
    This is slightly offtopic, but if you are in the NY area, I highly reccommend you see the play "21 Dog Years: Doing Time@Amazon.com" about a guy who went from customer service to bizdev to resignation. It's based on this book [amazon.com]; and yes it is very funny that Amazon carries it. They profit from their own critics.
  • "Absent from our suggested federal response is a role for the Federal Communications Commission. The reason is straightforward: the distributed denial of service attacks involve coordinated and criminal transmission of content over the Internet. It is hard to see how the FCC has statutory authority over such matters. Yet even if it had, or were given, such authority, the agency currently lacks the resources and expertise to do what is necessary at this point, namely, to fight the criminal activity. Simply put, useful FCCinvolvement would require statutory changes, additional resources, and additional expertise to succeed. This is work better left to law enforcement agencies."

    Okay, note the line "...distributed denial of service attacks involve coordinated and criminal transmission of content over the Internet"

    Criminal transmission of content? WTFF??

    Note also how it goes on to say the FCC shouldn't get involved since "FCC involvement would require statutory changes..." In other words, let's not waste time with all this analysis and law-making business and just get straight to the enforcement of what we want.
  • Alexa is all over my web logs every day....I don't even link to amazon (or any other commercial site, just some basic open source ones...apache, openbsd, sourceforge, etc)

    Soon I might just block them....but I would like to know how I got on their list of sites to crawl to excess.
  • ... why don't they just collect the 404s off the requests to their site? No need for spiders; if someone puts up a bad link, they can find out as soon as someone clicks on it. *sheesh*
  • Better Thread (Score:2, Insightful)

    by Chetmurray ( 216997 )
    [prosperotechnologies.com]
    Here amazon admits the issue and how they have stopped the bot until they can investigate the issue.

    Amazon is actually very affiliate friendly. They have banned the scumware like wurldmedia, ebates and others that try and hijack affiliate comissions. Unlike affiliate programs by overstock.com,buy.com and others that are so desperate for short term cash they will screw over their current affiliates for some quick cash.

    Considering buy.com is so deep in with the scumware people, i am surprised slashdot.org advertises them.
  • I use to work for Amazon.com as a Unix Admin and I can tell you Amazon and Alexa are barely related. They are two different companies, it's just that one owns the other. Barely anything between them on the computer system level is intergrated. The main offices for Amazon.com are in Seattle and Alexa offcies are in S.F., Ca.
    If someone is making a mistake at Alexa, Amazon.com can not really be held responsible.

UNIX is hot. It's more than hot. It's steaming. It's quicksilver lightning with a laserbeam kicker. -- Michael Jay Tucker

Working...