Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!


Forgot your password?
The Internet Your Rights Online

The Internet Archive Sued Over Stored Pages 801

Kailash Nadh writes "The Internet archive, which has been storing snapshots of millions of webpages since 1996 has been sued by the firm Harding Earley Follmer & Frailey, Philadelphia. The firm was defending Health Advocate, a company in suburban Philadelphia that helps patients resolve health care and insurance disputes, against a trademark action brought by a similarly named competitor. In preparing the case, representatives of Earley Follmer used the Wayback Machine to turn up old Web pages - some dating to 1999 - originally posted by the plaintiff, Healthcare Advocates of Philadelphia. Last week Healthcare Advocates sued both the Harding Earley firm and the Internet Archive, saying the access to its old Web pages, stored in the Internet Archive's database, was unauthorized and illegal." CT:update note that the submittor got it backwards: Healthcare Advocates is the sueing Wayback and Harding Earley Follmer & Frailey, not the other way around.
This discussion has been archived. No new comments can be posted.

The Internet Archive Sued Over Stored Pages

Comments Filter:
  • by 0110011001110101 ( 881374 ) on Wednesday July 13, 2005 @07:18AM (#13052033) Journal
    fsck me if i'm wrong, but wouldn't this be similar to suing someone for referencing an old book I wrote, just because I'd released a new one that didn't contain much of the old information?
    • A book is a physical object, you can reference a book as long as you do not republish it in its entirety. The internet isn't a physical object, it's a collection of bytes arranged in a specific manner. It's that collection that makes it simple to take someone elses work and republish it, almost effortlessly.

      The law has the ugly job of sorting out what constitutes copyright infringement -- republishing a website, perhaps? With the internet, it has become infinitely easier to republish works in their en
    • by mwvdlee ( 775178 ) on Wednesday July 13, 2005 @08:06AM (#13052401) Homepage
      "We've lost our case based on evidence and will now be suing the organisation that provided the evidence for doing so".
  • Robots.txt? (Score:3, Insightful)

    by AltGrendel ( 175092 ) <ag-slashdot.exit0@us> on Wednesday July 13, 2005 @07:18AM (#13052038) Homepage
    Did they set up their robots.txt file properly? If not, they may not have a case.
    • Re:Robots.txt? (Score:5, Insightful)

      by Looke ( 260398 ) on Wednesday July 13, 2005 @07:25AM (#13052082)
      Why would a missing robots.txt imply that others are allowed to distribute the content?
      • by Gopal.V ( 532678 )
        > Why would a missing robots.txt imply that others are allowed to distribute the content?

        It should be treated the same way trespassing for unfenced property is treated.

        The case should be dismissed as it reproduces verbatim with attribution content that was published for public bot scraping.

        Now what, will someone sue Yahoo ! or Google for caching pages or converting PDFs to HTML ? Or Coral Cache for unauthorized reproduction of websites ?.
        • Alas copyright law and laws of tresspass are essentially two different branches. Analogies don't really work because different parts of the law are quite deliberately designed to work in different ways. It'd be like me comparing putting peanut oil into a car to reading pornography from a floppy disk, or me comparing your analogy to the one I just made.

          Here's the deal, and it's not very good. If the Wayback Machine doesn't have permission (implied or otherwise) to archive websites and serve copies of them,

      • Because that is the nature of this beast called the WWW. Yeah, I know about copyright and such, and no, I am not one of those who believe information should be free always, but I do believe that when you publish toa medium that has sharing and caching and linking at its core, then you cannot blame others for your publuication being shared, cached and linked to.

        Think before you publish etc.
      • A present (not missing) robots.txt file which didn't include a rule for those pages might imply permission to cache...
      • Re:Robots.txt? (Score:5, Insightful)

        by slavemowgli ( 585321 ) on Wednesday July 13, 2005 @08:22AM (#13052585) Homepage
        Concludent behaviour. If I go to a doctor and get an injection, can I come back six months later and sue the doctor because he did not explicitely ask for permission to give me that injection? Well, I can true, of course, but I won't get far, because when he said "I'll have to give you an injection" and I didn't say no but instead rolled up my sleeve so he could give it to me, he was allowed to conclude that I was OK with it, even if I did not explicitely say so. IANAL, but I personally think the same principle should apply here. There is a standard mechanism for limiting access (in the sense of not authorizing it, that is, not as in making it technically impossible) - namely, robots.txt exclusion -, but if you chose to not use it, then the fact that you are running a *public* webserver that has the *sole purpose* of handing out its information to *everyone* who asks for it should be enough to conclude that you are, in fact, OK with not only the fact that people do receive your information, but also with the fact that they use it - no matter whether that means reading it (like a regular user would), indexing it (like a search engine would) or archiving it (like the Internet Archive *and* just about any search engine would).
    • Re:Robots.txt? (Score:3, Informative)

      by Baddas ( 243852 )
      As it says in the article, the robots.txt is an entirely voluntary measure. The IA doesn't need to obey it, but they do, in order to be a courteous member of the internet.
    • Re:Robots.txt? (Score:3, Informative)

      by Illserve ( 56215 )
      They don't have a case either way!

      Adherence to robots is voluntary, done in good faith by crawlers for the general well being of the web.
  • Cached (Score:5, Funny)

    by donnyspi ( 701349 ) <junk5@d[ ]yspi.com ['onn' in gap]> on Wednesday July 13, 2005 @07:18AM (#13052039) Homepage
    Better sue everyone who has visited the website in question but never purged their temporary internet files folder.
  • Other archives (Score:3, Insightful)

    by erykjj ( 213892 ) on Wednesday July 13, 2005 @07:18AM (#13052040)
    Would that make archived newspaper editorials, TV reports, etc. illegal as well? Google beware.
  • Lookng forward (Score:3, Insightful)

    by AtariAmarok ( 451306 ) on Wednesday July 13, 2005 @07:19AM (#13052043)
    Looking forward to newspapers filing similar frivolous lawsuits against libraries for maintaining old copies of the papers in their collections; copies that newspaper company might be embarassed about now.
    • Re:Lookng forward (Score:4, Interesting)

      by aussie_a ( 778472 ) on Wednesday July 13, 2005 @07:33AM (#13052143) Journal
      Again, not comparable (but this didn't stop you from getting modded up of course). The libraries had permission to buy the papers and allow access to them in the first place. Internet Archive had no such agreement with this company. IA took the absence of them saying no as an implicit agreement, which for pretty much anything else, isn't legal (it hasn't been tested yet with websites and caches). They in fact, did say no. But a bug caused this message not to be delivered/it was ignored some of the time.
      • Re:Lookng forward (Score:3, Interesting)

        by Ninwa ( 583633 )

        "The libraries had permission to buy the papers and allow access to them in the first place. Internet Archive had no such agreement with this company."

        You make it sound as if the internet archive archived pages that required authorization. All pages they authorized were available to the public at that point in time, therefore no contract is required. IANAL so correct me if I'm wrong, obviously their lawyers would say that I am, but I think this lawsuit is frivilous.

  • by ID000001 ( 753578 ) on Wednesday July 13, 2005 @07:20AM (#13052048)
    ....why not just ask them to take them off?
  • Library (Score:3, Insightful)

    by Pablo El Vagabundo ( 775863 ) on Wednesday July 13, 2005 @07:20AM (#13052053)

    Wouldnt this reference site be covered under some of the same protections as a library. It serves some of the very same purposes.

    Hopefully this falls flat.

    I wonder where the server are locations

  • by akadruid ( 606405 ) <{ku.oc.diurdeht} {ta} {todhsals}> on Wednesday July 13, 2005 @07:20AM (#13052055) Homepage
    Lawsuits these days sound more like people whining like spoiled brats than someone really done an injustice.

    They publish the thing, person X stores it, person Y uses stored info to prove they publish it. So what? If they'd written the thing in a newspaper they would sue someone for keeping the newspaper?

  • by wallykeyster ( 818978 ) on Wednesday July 13, 2005 @07:21AM (#13052056)
    Would a lawsuit be considered if instead of a cache of web pages, the other side had used old newspapers from the library or VHS recordings of an old television broadcast? Once they've put their web pages into the public, don't they lose control of who keeps a copy?
  • God damnit (Score:4, Insightful)

    by colonslashslash ( 762464 ) on Wednesday July 13, 2005 @07:21AM (#13052059) Homepage
    I don't know about you guys, but this whole "sue anything that moves" culture is really starting to piss me off.

    I'm not saying that legally they don't have a legitimate case, but is it really necessary to persue an organisation such as the Internet Archive over something so passive as this? In my opinion, hell no it isn't.

    • Re:God damnit (Score:4, Interesting)

      by Illserve ( 56215 ) on Wednesday July 13, 2005 @07:57AM (#13052318)
      It's going to get worse before it gets better. Our culture is being forced to confront issues of privacy and information ownership that have previously laid under the radar only because violating these issues was inconvenient or expensive.

      But the internet is changing that, and now an errant picture or snippet of text can be reproduced and distributed widely for practically zero dollars.

      I think eventually we'll settle on some kind of bubble of privacy concept, in which anything inside is legally protected, but anything you distribute outside that bubble is fair game for anyone, forever.

      This is generally the case in the real world. If someone wears clothes, they effectively have created a privacy bubble, only allowing limited information about themselves to be distributed (via reflected light) to be seen by others. But what information they do allow to escape is fair game for distribution in photographs.

      In a sci- fi series (Neverness et al), Zindell argues that in the future, even identity will be as carefully concealed in public as one's privates. As information technology saturates our culture, even revealing our identity in public is going to be increasingly dangerous.

      Of course DRM advocates will try to attach little bubbles of limited privacy to specific bits of content released into the wild. Eventually, I hope, common sense will prevail and such ridiculous notions will be abandoned.

      • Re:God damnit (Score:4, Interesting)

        by _LORAX_ ( 4790 ) on Wednesday July 13, 2005 @08:29AM (#13052667) Homepage
        Actuanlly no..

        The courts have held that things not plainly visible ( plainly being not obvios to a human at a reasonale distance or public place ) are illegal to disiminate. Like when you turn on night vision during the day. It captures IR and translates it to B&W, the problem is that our body reflects more of it than our clothes do giving all clothes a semi-transparent look. The courts have held that even though they were recourding in public they violated the privacy of the people taped. This doesn;t mean that all IR captures in public are illegal, but when it's specifically used to reveal information about a person that is not plainly visible it might be a crime.

        The courts have also held that augmention of senses cannot be used as an excuse to break the 4th ammendmant. Cops can only use items that are plainy visible to initate a search on a private residence. This president was set after they used heat signatures to get warrants for pot growers ( because of the grow lamps used ). Remeber that with technology today you can basicly see movement and hear speech through walls.

  • summary is incorrect (Score:5, Informative)

    by paulbd ( 118132 ) on Wednesday July 13, 2005 @07:22AM (#13052064) Homepage

    The archive is being sued by Health Advocates, not the legal firm that had defended Health Advocates. In fact, the legal firm is named in the suit as well.

    And to clarify: its not a simple "you have our stuff stored on your systems" claim. Rather, Health Advocates is claiming that the archive failed to follow the instructions in robots.txt that were intended to prevent access to historical material.

    • by kevmo ( 243736 ) on Wednesday July 13, 2005 @08:06AM (#13052386)
      HealthCARE Advocates is suing, not Health Advocates. There is a trademark case of Healthcare Advocates (plaintiff) suing Health Advocates (defendant). The legal firm defending Health Advocates digged up the old archive. HealthCare Advocates, the plaintiff, got desperate and is suing the legal firm and IA probably in order to try to exclude whatever evidence the defense legal firm dug up.

      I guess you were trying to be informative, but in this case it makes a big difference as to which company is doing the lawsuit. Its the plaintiff, not the defendant.
  • by div_2n ( 525075 ) on Wednesday July 13, 2005 @07:22AM (#13052065)
    They got caught with their pants down and now are suing because someone kept the evidence. Boy do I hope this lawsuit meets a swift and decisive end in favor of the Internet Archives.

    To be candid, I'm surprised it took this long for someone to sue them.
  • by inkdesign ( 7389 ) on Wednesday July 13, 2005 @07:23AM (#13052073)
    ..on at least two dates in July 2003, the suit states, Web logs at Healthcare Advocates indicated that someone at Harding Earley, using the Wayback Machine, made hundreds of rapid-fire requests for the old versions of the Web site. In most cases, the robot.txt blocked the request. But in 92 instances, the suit states, it appears to have failed, allowing access to the archived pages.

    For the "I don't wanna rtfa because its early" crowd.
    • by Stalyn ( 662 ) on Wednesday July 13, 2005 @07:40AM (#13052184) Homepage Journal
      you forgot,

      In so doing, the suit claims, the law firm violated the Digital Millennium Copyright Act, which prohibits the circumventing of "technological measures" designed to protect copyrighted materials. The suit further contends that among other violations, the firm violated copyright by gathering, storing and transmitting the archived pages as part of the earlier trademark litigation.


      Even if they had, it is unclear that any laws would have been broken.

      "First of all, robots.txt is a voluntary mechanism," said Martijn Koster, a Dutch software engineer and the author of a comprehensive tutorial on the robots.txt convention (robotstxt.org). "It is designed to let Web site owners communicate their wishes to cooperating robots. Robots can ignore robots.txt."

      William F. Patry, an intellectual property lawyer with Thelen Reid & Priest in New York and a former Congressional copyright counsel, said that violations of the copyright act and other statutes would be extremely hard to prove in this case.
    • by cdrudge ( 68377 ) * on Wednesday July 13, 2005 @07:57AM (#13052319) Homepage
      For the "It's too early to think crowd"...

      How did Healthcare Advocates determin that Haridng Early was making hundreds of requests for files on the Wayback Machine? The logs would have been kept on the Wayback Machine's servers, not on anything Healthcare Advocates would have access to easily. Harding Earley would be accessing the files via the Wayback Machine's copies, not the copies that are kept on Healthcare Advocates website
    • by poena.dare ( 306891 ) on Wednesday July 13, 2005 @08:04AM (#13052368)
      The suit contends, however, that representatives of Harding Earley should not have been able to view the old Healthcare Advocates Web pages - even though they now reside on the archive's servers - because the company, shortly after filing its suit against Health Advocate, had placed a text file on its own servers designed to tell the Wayback Machine to block public access to the historical versions of the site.

      So the robots.txt was added YEARS AFTER the site had been archived. I don't think they correctly used the "no-archive-time-travel" directive.
  • by TractorBarry ( 788340 ) on Wednesday July 13, 2005 @07:24AM (#13052081) Homepage
    Assuming the judge has more than one brain cell then this case should take no more than 30 seconds and will be summarised in two sentences.

    "You published information on a public medium. Case mismissed."

    But then again this is America we're talking about.. home of the idiot lawsuit and lunatic judicial decisions so I don't hold out much hope for the triumph of reason...
  • by Sierpinski ( 266120 ) on Wednesday July 13, 2005 @07:26AM (#13052094)
    Stella! [stellaawards.com]
  • the bottom line (Score:5, Insightful)

    by countzer0interrupt ( 628930 ) <countzer0interru ... m ['oo.' in gap]> on Wednesday July 13, 2005 @07:29AM (#13052115) Homepage
    He said that the robots.txt file is part of an entirely voluntary system, and that no real contract exists between the nonprofit Internet Archive and any of the historical Web sites it preserves.
    Exactly right. The plaintiff is an asshat. The bottom line for publishing anything to the Web is: if you don't want it copied across the world, saved on people's hard disks (either automatically in a browser cache, or deliberately by the user), and potentially redistributed (after your initial act of publishing) for the rest of time, don't publish it to the Web. I'm not advocating the breach of copyright here - sure, I want credit of paternity for anything I put on the Web, at the very least. Pragmatically, however, I know that the Web (and the Internet at large) is a much more fluid medium. Somebody may save my webpage, copy a quote from it, download an image and use it as their desktop wallpaper, simply because they can. I can't stop them, and I'll never have proof that they did it, so I couldn't sue them if I wanted to. Therefore, I should exercise some common sense, and remember that the Web is a public medium, and if my work is so precious then maybe I shouldn't put it up there. Some web site owners want to use the power of the web to reach huge numbers of people, but they don't want to pay the price of such a fast and powerful medium. Once your words are out there, you may never get them back.
  • by dyfet ( 154716 ) on Wednesday July 13, 2005 @07:32AM (#13052135) Homepage
    ...when someone is finally sued for the "unauthorized memories" they carry with them...oh never mind, that already happened.

  • by hhghghghh ( 871641 ) on Wednesday July 13, 2005 @07:35AM (#13052157)
    This is a case where a plaintiff of an action (that they probably lost) is sueing opposing council for using the internet archive looking for old documentation that is used as evidence against its claims. In effect, they're claiming that because they had a robots.txt any page that might have been on the internet archive was there illegaly, and shouldn't have been used as evidence.

    In effect, they're saying "we were wrong, we tried to destroy the evidence of our wrongdoing, but because the shredder jammed and you found the evidence anyway, you're abusing our copyright".

    The court hearing their argument should thoroughly smack them. Perhaps they should be brought to justice for trying to destroy evidence (or instructing a third party to do so), surely that's illegal in these post-Enron days.
  • by MrNonchalant ( 767683 ) on Wednesday July 13, 2005 @07:42AM (#13052207)
    info@healthcareadvocates.com [mailto]

    Be gentle, they might be in the right after all.
  • by millennial ( 830897 ) on Wednesday July 13, 2005 @07:54AM (#13052290) Journal
    ... if they lose this fight.
    For example, 2600 Magazine's old web site containing a copy of the DeCSS source code is stored in the Archive. Could the Archive be held in violation of the DMCA for mirroring someone else's old site?
  • by FooHentai ( 624583 ) on Wednesday July 13, 2005 @07:56AM (#13052310) Homepage
    ""Day by day and almost minute by minute the past was brought up to date. In this way every prediction made by the Party could be shown by documentary evidence to have been correct; nor was any item of news, or any expression of opinion, which conflicted with the needs of the moment, ever allowed to remain on record. All history was a palimpsest, scraped clean and reinscribed exactly as often as was necessary."
  • Excuse me but... (Score:4, Insightful)

    by hacker ( 14635 ) <hacker@gnu-designs.com> on Wednesday July 13, 2005 @08:08AM (#13052419)

    First and foremost, the existance of a robots.txt does not constitute a contract between the client (a web surfer/browser agent) and the server (the site hosting the content proper). Repeat that over and over. There is nothing stating that the existance of robots.txt on your server must be requested by my crawler or spider.

    Its preferred, but not required. Even so, I am free to ignore it if I want, and parse whatever links I see fit to grab. If you make the content public and I want to read that content, I'm going to get it, whether you have robots.txt in place or not.

    Secondly, has anyone taken the time to validate the robots.txt [searchengineworld.com] file found on the site in question [healthcareadvocates.com]? Note too that they just changed robots.txt on July 8th of this year. Did the previous version validate? Are they trying to rewrite history again? What did the old version look like?

    If there is even so much as one error, robots/crawlers are free to ignore/parse/merge/break it as they see fit. It happens all the time, and even when robots.txt is perfectly valid, many robots and crawlers ignore it anyway (msnbot and Yahoo's crawlers are two of the worst offenders here).

    But back to the first point, robots.txt is a guideline, not a rule, not a contract, and certainly not something that can be enforced. Does lack of a robots.txt file constitute the legal right to publically redistribute the content? Or store it for later review and retrieval? How do you know any of your former employees from 1996 haven't stored your entire website on floppy, one page at a time? Did they adhere to robots.txt? Did ANYONE adhere to robots.txt in 1996? It seems that there was evaluation of the Robots Exclusion Standard [robotstxt.org] in 1996, but was everyone using it? Not likely.

    Microsoft Internet Explorer will certainly store the entire website for "reading offline" if you ask it to do so when bookmarking it. They don't parse robots.txt to exclude pages that shouldn't be stored locally.

    Its too bad that people need to try to erase history to prevail in litigation. This isn't George Orwell's 1984... well, at least not yet anyway.

  • by MrBandersnatch ( 544818 ) on Wednesday July 13, 2005 @08:19AM (#13052547)
    But on at least two dates in July 2003, the suit states, Web logs at Healthcare Advocates indicated that someone at Harding Earley, using the Wayback Machine, made hundreds of rapid-fire requests for the old versions of the Web site. In most cases, the robot.txt blocked the request. But in 92 instances, the suit states, it appears to have failed, allowing access to the archived pages.

    In so doing, the suit claims, the law firm violated the Digital Millennium Copyright Act, which prohibits the circumventing of "technological measures" designed to protect copyrighted materials. The suit further contends that among other violations, the firm violated copyright by gathering, storing and transmitting the archived pages as part of the earlier trademark litigation.

    Wow that is stretching things!! Ive never read the DMCA but to claim that a robots.txt file (which isnt a legally binding mechanism by any means) added to the site after the pages had been indexed had been ignored by the wayback machine was a circumvention of their copyright and a violation of that act...well Id fully expect any judge to have a good laugh at this.

    HOWEVER given how poor the US legal system is I wouldnt be suprised to hear that robots.txt gains legal status as a binding document for crawlers!!
  • by mcc ( 14761 ) <amcclure@purdue.edu> on Wednesday July 13, 2005 @09:48AM (#13053428) Homepage
    For some reason all that comes to mind when I hear the reasoning behind the filing of this lawsuit is "Liar, Liar".
    JIM CARREY: I object!
    JUDGE: On what grounds?
    JIM CARREY: It's devastating to my case!
  • Sue this (Score:3, Funny)

    by fluor2 ( 242824 ) on Wednesday July 13, 2005 @09:50AM (#13053446)
    I demand that Slashdot will remove this comment after 3 days! Or else I'll see you in court!

  • Sue a witness? (Score:5, Informative)

    by Neurotoxic666 ( 679255 ) <neurotoxic666@@@hotmail...com> on Wednesday July 13, 2005 @01:38PM (#13055981) Homepage
    Can you sue a witness because he remembered the facts against you during a trial the same way the Wayback Machine is being sued because it "remembers" old facts and saying and has been used in courts?....

You mean you didn't *know* she was off making lots of little phone companies?