Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Privacy Government The Courts The Internet News

Wayback Archives as a Law Tool 198

Carl Bialik from the WSJ writes "The Wayback Machine's internet archive and Google's cached pages are becoming indispensable tools for some lawyers, especially specialists in intellectual-property law. Dell has used copies of expired websites to get the domain name DellComputersSuck.com transferred to it, the Wall Street Journal reports. EchoStar used Wayback in a case against a Polish TV company. Playboy checks Wayback to look for infringers of its trademark bunny or other images. And Wayback was even used to discredit a witness and reach a mistrial in a Canada murder case."
This discussion has been archived. No new comments can be posted.

Wayback Archives as a Law Tool

Comments Filter:
  • by bigwavejas ( 678602 ) * on Friday July 29, 2005 @01:29PM (#13196330) Journal
    WBM works great for pulling up historical text content, but I've noticed it tends to be hit-and-miss with images. Try pulling up a website and chances are you'll see broken image links.
    • by el_gordo101 ( 643167 ) on Friday July 29, 2005 @01:43PM (#13196480)
      That's because they don't save the image files, as far as I can tell. The images actually point back to the site that was archived through some sort of re-direct on the WBM site. If the images files no longer exist on the original site, they will not display on the WBM archived page.
      • That's because they don't save the image files, as far as I can tell. The images actually point back to the site that was archived through some sort of re-direct on the WBM site.

        I don't think that's the case, I don't think it even tries to grab the images from the server. At least with my webpage, I just clicked on the Nov 17th, 2001 archive... it had all the old images that I've long since deleted... and the server logs show no hits/404's for those images...

        Maybe that's not always how it operates t
        • by el_gordo101 ( 643167 ) on Friday July 29, 2005 @02:02PM (#13196674)
          Interesting. Our sites date back to 1999. The images files from the older versions of sites do not show up, as these files were deleted long ago. The newer versions that use the images files that are still resident in our /images directory work fine. I am not sure how they handle images.
          • As far as I could tell from experimenting with an old site of mine WBM tends to have multiple copies of sites from different times. If an image, page or any other object is missing from one of those archived copies it will redirect to the next most recent copy of that file in the WBM and so on. If the latest archived copy still doesn't have the file THEN it redirects to the actual website.

            I've often noticed that WBM pages load very slowly and I suspect that this is partly due to all the chained redirects yo
    • They seemed to have captured Goatse [archive.org] just fine.
  • by Shadow of Eternity ( 795165 ) on Friday July 29, 2005 @01:30PM (#13196338)
    Peaple go to the library and dig through hundreds of old newspapers and records, whats the big deal with using wayback for websites?
    • For one its not quite as verifiable. Who is to say, for example, that someone with access to the Wayback servers couldn't put their own content and dates on there, and then use that as "evidence" for some suit?

      I don't know how (if?) its regulated, any insights into this?
      • by Mr Guy ( 547690 ) on Friday July 29, 2005 @01:37PM (#13196420) Journal
        In the article, it mentions one of the archive's technicians signing an affidavit saying they think it's a true archive. No one would ever lie about that for a big corporate payout.
        • They might. But if the other side can provide some evidence that it isn't a true archive than said technician is in deep shit.
          • No, they're not. The Technician is merely saying, to the best of his knowledge, it is a true archive.

            In court, a witness can not tell the truth and still not be commiting perjury. Haven't you ever seen or heard or read a transcript of a trial where one attorney will ask a witness, "And to the best of your knowledge?"

      • by TTK Ciar ( 698795 ) on Friday July 29, 2005 @01:52PM (#13196577) Homepage Journal

        For one its not quite as verifiable. Who is to say, for example, that someone with access to the Wayback servers couldn't put their own content and dates on there, and then use that as "evidence" for some suit?

        I don't know how (if?) its regulated, any insights into this?

        I work at The Archive. There are only two people, three at most, with the expertise and access to pull something like this off, and if someone tried Brad would almost definitely notice. There are checks in place to detect bitrot in the web archive, and altering older ARCs to include new information would be detected as bitrot and flagged for closer attention. They would then be compared against the copies in our sister organization's data cluster in Europe, and possibly also compared against the copies in the datacenter in Egypt.

        To make it work, you'd pretty much have to get Brad to play along, and he is fanatical about the integrity of the web data. I don't think you could pay him enough to do it, and he doesn't have any sons or daughters you could kidnap for blackmail.

        How one would go about demonstrating all of this in court, though, I do not know. IANAL.

        -- TTK

        • disclaimer (Score:4, Informative)

          by TTK Ciar ( 698795 ) on Friday July 29, 2005 @02:27PM (#13196885) Homepage Journal

          I do not speak for The Archive. The above post should not be considered to reflect the official position of The Archive. It is purely my own personal opinion, and it was uttered under the influence of painkillers (I had my wisdom teeth yanked out of my jaw Wednesday, qv my Slashdot journal entry). Else I probably would have refrained -- talking about this at all while there's a court case pending was probably a really stupid idea, and I (usually) know better.

          -- TTK

        • You could put up a time-signing server somewhere safe and let it sign the secure hashes of content along with the time. That would really restrict the possibility to mess with the archives without somebody noticing. There are even HSM (hardware security modules) which will let you do that while leaving the key in a safe space. I doubt the signature would normally hold up as a legal signature, but it would make it much harder to attack the content.
        • SO the system rely's on the integrity of one man.

          That's fine, assuming he actually has integrity, and that he will live forever.

          For the sake of arguement, I believe that this Brad guy has a lot of integrity.
          Now, does he live forever? Will he be in this position forever?

          if either of those is No, then the system will fall apart.
    • by Kenja ( 541830 ) on Friday July 29, 2005 @01:59PM (#13196643)
      Your right! We must put a stop to this movable type menace before the serfs use it to spread anti authoritarian pamplets!
    • Maybe not conceptually, but before archive.org, there simply was no one doing this. We don't know today, which website pages today, if we could reconstruct them, would be worth $10M as evidence for a legal case. Wayback gives us the answer, because it captures nearly all the web.

      The point is that in many cases today, the contents of a website over time can be gold. I have cracked many cases this way -- it used to be my secret weapon. Now everybody knows. Darn.
  • by awacs ( 706692 ) on Friday July 29, 2005 @01:31PM (#13196353) Homepage
    ... WBM respects the site's decision to not allow archiving. Unfortunately, those sites who might be the most interesting know that, and know that they can block archiving.
    • by pilgrim23 ( 716938 ) on Friday July 29, 2005 @01:50PM (#13196557)
      Another point of note: Net Nanny and Surf Watch or other such tools blcok the main sites. they do NOT block the WBM archive of goatse.cx or the like. AND THAT IS A GOOD THING!!!

      Example:

      www.copstalk.com used to be the home page for a maker of Macintosh to PC via Appletalk cross platform communications tools. They were later bought out. If you wish to look at documentation on their older products, go to the WBM. www.copstalk.com these days IS A PORN SITE.
  • In Contrast to (Score:4, Interesting)

    by goneutt ( 694223 ) on Friday July 29, 2005 @01:33PM (#13196363) Journal
    A few weeks ago /. had a bit on an Internet Archive that got sued for having material that was ordered withdrawn by a court. How can they have things both ways? I think I'm begining to understand why law school is so hard, you have to learn to think like a lawyer, which is like learning to type with your nose.
    • you have to learn to think like a lawyer, which is like learning to type with your nose.

      Well, unless the keyboard is new the keys are dirty you know...
    • The previous /. story [slashdot.org] is referenced in the article: " Philadelphia company, Healthcare Advocates Inc., says it used a robots.txt file to block access to older versions of its site. When a law firm used the Wayback Machine to nonetheless access old material from the site, Healthcare Advocates sued, alleging computer fraud and violation of federal copyright law. In its suit, the health-care firm contends the law firm "intentionally circumvented" the robot.txt's blocking mechanism by making repeated search req
    • by fritter ( 27792 ) on Friday July 29, 2005 @01:55PM (#13196606)
      OMG D00D you're totally right! Some lawyers argue for one thing, while others argue for another! I just heard about this case the other week where this lawyer made a case that some guy was guilty of murder, but another lawyer was arguing the EXACT SAME GUY was innocent! I cannot believe they tried to have it both ways! Unlike every other profession in the world, where everyone thinks exactly alike!

      Seriously, how did this get modded "Interesting"?
      • If you're going to use the wayback as an archive you can sue someone for using it. It's like suing someone for reading an old phone book to get your number and address that you've since made unlisted. (Although that might be stalking) Or going the library to look up an old magazine where you get an article that embarasses someone.

        BTW, your bit isn't that funny.
      • If you had understood his point, you would know why it got marked interesting.

    • yuhuhpoiu sauick
    • you have to learn to think like a lawyer, which is like learning to type with your nose, and look good while you're doing it.
    • We live in a world of double standards.

      New Windows vulnerabilities announced here lead to a rehash of all the FOSS vs closed-source jokes and other stuff while tones mysteriously get more serious when FOSS vulnerabilities are found.

      Internet archives are bad and violations of copyright which lawyers will complain loudly about when they have to face incriminating evidence coming from such services... but when internet archives contain evidence needed to close cases, they suddenly become indispensable tools.

      N-
  • by October_30th ( 531777 ) on Friday July 29, 2005 @01:33PM (#13196365) Homepage Journal
    Maybe this is a bit off-topic, but employers are also known to use Google and web archives to check up on the past of a potential employee. So be careful what kind of statements you make on the net using your real name.
  • I wonder... (Score:2, Insightful)

    by Bryansix ( 761547 )
    I wonder if people will try to sue website owners for content that they already pulled off of thier website. I mean I would hope not but I could see how this could happen. A person realizes that certain content is copyrighted and then pulls it and later on some lawyer for the owner of this content sues and uses google cache or WBM as a tool to prove it posted copyrighted material.
    • Re:I wonder... (Score:4, Insightful)

      by Peyna ( 14792 ) on Friday July 29, 2005 @01:36PM (#13196409) Homepage
      Just because you stopped doing something, doesn't mean it wasn't illegal while you were doing it.

      See, if I am beating the crap out of you, but stop before the police get there and witness it, that doesn't mean I wasn't beating the crap out of you and therefore guilty of battery.

      It's a weird example, but it works.

      If you've ever read some of the RIAA threat letters you'll notice they specifically state that just because you listen and pull down the offending material doesn't mean they're giving up their right to sue you for posting it in the first place.
    • In a sense, it's already happened.

      http://yro.slashdot.org/article.pl?sid=05/07/01/13 55234&tid=123&tid=95&tid=155 [slashdot.org]

      This one just relies on taking an old post out of context to 'shed light' on a current situation.
  • by kensai ( 139597 ) on Friday July 29, 2005 @01:34PM (#13196378) Homepage
    If Google cached pages from the WBM and the WBM archived Google cached pages, wouldn't that cause an infinite loop. j/k
  • by DJ Rubbie ( 621940 ) on Friday July 29, 2005 @01:35PM (#13196388) Homepage Journal
    $ cat robots.txt
    User-agent: ia_archiver
    Disallow: /

    My site is not archived there, problem solved.

    (Of course, if another of these service pops up...)
  • by expro ( 597113 ) on Friday July 29, 2005 @01:36PM (#13196401)

    Destroying evidence? If I don't want to be caught and ask for older web pages to be removed, that may contain incriminating evidence such as illegal copies of things or illegal links, is this different from a request by any other copyright holder to have his pages removed, and can it be punished? What are the archives retention policies, and have legal orders been served to prevent destruction of evidence?

    What would be even better would be if the archives digitally signed their archives and kept signatures even of those things that had been asked to remove so that the validity of a copy could be established if made for legal purposes (SCO, Scientologists, and other things come to mind) even if later censored.

    • What would be even better would be if the archives digitally signed their archives and kept signatures even of those things that had been asked to remove so that the validity of a copy could be established if made for legal purposes

      But how can you force the submitter of the removal request to store a copy, let alone an exact copy, from which the checksum can be calculated?

      If the submitter keeps the page it is not necessarily the same set of bytes as the removed one (think dynamic pages).

  • Hey Mr. Peabody! (Score:4, Interesting)

    by DigitalReverend ( 901909 ) on Friday July 29, 2005 @01:36PM (#13196410)
    Stand back Sherman as we set the dials on the wayback machine to 1845....

    Seriously, theres been many times I would want to kiss the person running wayback. I lost my home a few years back and had several websites that I lost because I hosted out of my house. I have been able to rebuild, or come fairly close to duplicating those original sites.

    As for lawyers, if there wasn't somebody already archiving all these sights, they'd get someone to do it for them and then it would not be accessible to the public. I guess we need to take the good with the bad on this.
  • Childish Grudges (Score:2, Insightful)

    The childish nature of these corporations is ridiculous. Looking through archives of up to nine years just to point out: "Hey, you said we suck!" Who cares.

    If Dell did not suck, they would not have to be so defensive.

  • Is there nothing it can't do?
  • by markpapadakis ( 115698 ) on Friday July 29, 2005 @01:41PM (#13196465) Homepage
    Sometimes being able to see in the past is more valuable than being able to see in the future. It makes sense for lawyers to try to find ways to look back, for evidence and proof can only be found in the past.

    If you have evidence, you can prove your claims. If you can prove your claims, you win a dispute. If you win the dispute in favor of your client, that makes you one good lawyer.
  • by up2ng ( 110551 ) <chucklepatch.up2ng@com> on Friday July 29, 2005 @01:44PM (#13196493) Homepage
    In the future what will stop someone or some entity from falsifying information knowing that the legal system will use this.
    How can the info in wayback.org or google be trusted ? You can make redirect pages based on googlebot or wayback that have nothing to do with what is really on the site.

    In the article it is mentioned that vodaphone.com was taken by a squatter and they used wayback to show that her intentions were "intended to misleadingly attract consumers" ,what if she had WBM's bot goto a "nice" page and not the real site that case could have gone differently.
    I think that if someone wants to they can plan ahead and use this in a nefarious fashion.
    • That's why you can selectively enter evidence.

      Their case would have gone forward just the same, except that they would have avoided showing that page.

      If the person were to bring it up in an attempt to say, "hey, this wasn't nefarious!" then they would attempt to prove it discredible.

      The same person was able to do the same to the WBM. Show us that the page that it's claiming that you were displaying at this date isn't what you really had up there, and the jury can decide.
    • That's why the archive's administrators signed an affidavit stating that the information was, to the best of their knowledge, not tampered with. And it would be up to the other lawyers to prove that you were falsifying the information, which could lead you into further trouble or at the very least remove any doubt of your guilt from the jury and show that you were indeed acting in bad faith.
  • robots.txt (Score:3, Informative)

    by Cyburbia ( 695748 ) on Friday July 29, 2005 @01:48PM (#13196535) Homepage
    Even if the Wayback Machine archived your site, adding an appropriate robots.txt file to your Web site's root directory will make _all_ previous archives inaccessible to the public. I discovered this by accident, after I blocked the Wayback Machine robot by accident in an attempt to control malicious spiders. After I modified robots.txt, all the old archives reappeared after a few weeks.

    I used the Wayback machine to grab thousands of messages from an old WWWBoard-based message board that I ran, for eventual conversion to vBulletin. Some years, the Wayback Machine crawled every month; others it didn't even visit. Probably 80% of the messages that were posted before 2000 are lost to the ether of cyberspace. Guess you can't expect it to archive everything.

  • Old Drivers (Score:4, Informative)

    by TheSeventh ( 824276 ) on Friday July 29, 2005 @01:57PM (#13196627)
    I was able to use the wbm last year to find some old device drivers for a no-name motherboard I had from '97. The company went out of business, and their remaining stockpiles were bought by some other Chinese company, but the wbm actually had old copies of the drivers, and even a bios update for the board. Now, I always check there when I am having a hard time finding stuff that I knew used to be around.

    /bq
  • Wait, shouldn't it be "Internet Archive's Wayback Machine" instead of "The Wayback Machine's internet archive"?

    Or maybe this was intended: "Internet Acrhive's Wayback Machine's internet archive."

  • Lawyers rarely google or do research, Paralegals do it. Pretty soon it's going to trickle up to the actual lawyers who use this stuff where it's coming from. Then being rich, they will go buy loads of Google stock. If for no other reason then to keep these goodies flowing.
  • Playboy (Score:5, Funny)

    by floppy ears ( 470810 ) on Friday July 29, 2005 @02:03PM (#13196680) Homepage
    Playboy checks Wayback to look for infringers of its trademark bunny or other images.

    So they're basically just sitting around surfing porn too, eh?
  • by Rurik ( 113882 ) on Friday July 29, 2005 @02:17PM (#13196796)
    Google Caching and Wayback lookups. You could easily look URLs up by right-clicking on them.

    Oh, wait, there is one [mozilla.org].

    /shameful plug
  • by asscroft ( 610290 ) on Friday July 29, 2005 @02:28PM (#13196890)
    I imagine if you searched for sites talking about 9-11 pre-9-11-2001 you'd find some interesting things. Post 9-11 there were a zillion references, but pre-9-11 there couldn't have been that many.

    Same with the london bomings, no?

  • by notnAP ( 846325 ) on Friday July 29, 2005 @02:42PM (#13197018)
    Playboy checks Wayback to look for infringers of its trademark bunny or other images.

    In a related story, managers at Playboy have taken note of productivity differences between John Salem, who was tasked with finding instances of people illegally using the playboy logo, and Henry Waxman, who has been looking for instances of "other images," but has been observed taking frequent bathroom breaks.
  • Wow, who would have thunk that an archive of history could be useful to anyone, let alone for rich companies to use to sue?

    I think its widely recognized that the Internet Archive is general societal good, it should be funded as such.
  • So they're checking for past infringments. Is that the same as current infringment? I.e. even if you get a takedown notice and obey it, are you still equally guilty that once upon a time you in violation? That's what it's soundling like here.
  • They killed Canada! Those bastards!

    - Peace
  • by Nom du Keyboard ( 633989 ) on Friday July 29, 2005 @04:54PM (#13198308)
    Requests from third parties to remove information are generally denied. The Wayback Machine makes exceptions in certain circumstances, for example if the Web pages contain personal information provided in confidence, such as medical data.

    I bet the (so-called) Church of Scientology gets everything they want pulled.

    In addition, Web-site operators can prevent material from remaining in the public domain by using a piece of computer code, known as a robots.txt file, which stops bots belonging to the Wayback Machine and regular search engines from copying pages.

    This is pretty bogus because it only works if there is still a current web-site at the spidered address that is on-line and can deliver a robots.txt file saying DON'T! It has already been proven in another case that rapid-fire multiple requests to WBM will cause it to give up pages even when robots.txt says not to.

    I see two ways to fix this problem of misuse of a valuable archive:

    1: Federal law PROHIBITING the use of evidence from the Wayback Machine in court trials. This is a valuable historical archive that will be less valuable if people worry that it can be used against them in the future in unforeseen ways, and block contributing to it. How many sites already block the WBM TCI/IP address range?

    2: WBM could simply announce that they refuse to cooperate in any future trials -- AND THEN DO EXACTLY THAT! Without them to attest to the accuracy of the retrieved data, many cases relying on that data would fall flat on their faces.

    Think for a moment. The WBM was not created to make lawyer's lives easier, and their law firms richer!

    • What I'd do is to force WBM to either become legally compliant (ie archive their data on WORM media) or not being admissible in court.

      Problem solved since WBM can impossibly keep that much data on revision-proof media.

The Tao is like a glob pattern: used but never used up. It is like the extern void: filled with infinite possibilities.

Working...