Wayback Machine Safe, Settlement Disappointing 182
Jibbanx writes "Healthcare Advocates and the Internet Archive have finally resolved their differences, reaching an undisclosed out-of-court settlement. The suit stemmed from HA's anger over the Wayback Machine showing pages archived from their site even after they added a robots.txt file to their webserver. While the settlement is good for the Internet Archive, it's also disappointing because it would have tested HA's claims in court. As the article notes, you can't really un-ring the bell of publishing something online, which is exactly what HA wanted to do. Obeying robots.txt files is voluntary, after all, and if the company didn't want the information online, they shouldn't have put it there in the first place."
Simple post (Score:3, Informative)
Info published on the Internet... (Score:3, Insightful)
I'll no doubt have lawyer (and lawyer wannabees) protesting - but that only follows the literal and common sense meaning of "public domain," instead of the legal rationalization which has bee
Re: (Score:2)
Re: (Score:3, Insightful)
Re:Info published on the Internet... (Score:5, Informative)
If I post your credit card and bank information on a forum site, does that mean it is now public domain and you have no protection?
If I post on a forum site that I am selling stolen credit card info and bank info, my post should not be touched, because it is public domain and it should be freely available?
Re: (Score:3, Interesting)
If anything bad comes from it, it only means that the banks employ weak security. That information by itself should mean nothing. Complain to the financial institutions, not the person who posts it. Make it the bank's problem and it will go away. Don't use their services until they make it secure without making it unduly inconvenient for the customer. The silly passwords and 20
Re: (Score:2, Insightful)
You know the current standard the US follows, for copyright of printed works, is LIFE+70 years? That means that once the author copyrights their work, the copyright is good for 70 years after they die. Only after the copyright expires and it is not renewed, the work becomes public domain.
http://onlinebooks.library.upenn.e [upenn.edu]
Re: (Score:2, Informative)
Re: (Score:2, Insightful)
My friend's hosting service got hacked. we caught it right away, before a site had been put into place, but the individuals attempted to put up the site http://paypal-protect.org./ [paypal-protect.org.] We shut them down quick. They went on to hack another hoster, and currently have their little phishing site up and running. I suggest you go to the site, and without using ANY real information, login with a bogus email and password, and check it out. If you take a
Re: (Score:2)
I don't see what your point is. Surely if these guys are engaged in criminal activity as you suggest, and have contact information, they should be investigated and arrested. The FBI (etc) should take over the contacts and shut them down, or use them to entrap other thieves. It
Re: (Score:2)
No, I read it all. But the point is you want to be judge, jury and executioner. Let the police deal with him. Don't complain about being obstructed from going vigilante. Sure, you may be righteous, but not everyone is.
Besides who is actually going to search in an archive of an old forum when they want to find a reliable criminal to deal with? There are plenty of live forums where you can do this almost openly.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
So you think the possibility of compromised personal information, even though they do have a policy to remove it, outweighs all of the other valuable uses? You've cited effectively one negative use, and (considering the removal policy, see my reply to the parent poster) one that isn't very sound at that. Positive uses: academic research, law enforcement, catching racist or otherwise offensive language on politicians websites, tracking down produc
Re: (Score:2)
But isn't the purpose of copyright to extend legal protection beyond "controlled, private distribution"?
After all, photocopiers, VCRs, audio tape recorders, CD/DVD writers -- heck, the printing press -- mean that distribution is no longer controlled or private, unless you restrict access to people who can use them. (Or you try to make it technically difficult via DRM, but that's only a temporary
Re: (Score:2)
Those are examples of private distribution. By "controlled private distribution," I do not mean avoiding distribution to the public through regular sales channels, where there exists a definite relationship between buyer and seller. When you buy a CD, that is a private transaction between you and the seller. It is controlled (you get the CD after you'
Re: (Score:2)
I think that is too restrictive. You should automatically give up your copyright if you show you work to the public by any means.
Re: (Score:2, Insightful)
I want.... (Score:5, Funny)
I want a search engine that only indexes items excluded in the robots.txt file
Re: (Score:3, Interesting)
What's interesting is that I've heard of robots that do that exclusively. It may of been here on slashdot, but I've heard of people putting stuff in their exclude list in robots.txt and some robots _ONLY_ searched those files.
A world without cooperation (Score:5, Insightful)
Even if you don't fear the legal system, disregarding robots.txt can quickly get you in trouble. There are junk-scripts which feed bots endlessly and there are blocklisting automatisms against unbehaving bots. If people program their bots to ignore robots.txt, these and possibly more proactive self-defense mechanisms will become the norm. Is that the net you want? Maybe obeying robots.txt is the better alternative, don't you think?
Re: (Score:3, Interesting)
Furthermore, there are perfectly good ways to lock content away from the outside in a more rigourous way, pa
Re:A world without cooperation (Score:5, Insightful)
You've got it backwards (Score:2)
See: that company DID NOT HAVE a robots.txt directive active when the Wayback machine archived it. They put the robots directive up two we
Re: (Score:2)
Re: (Score:2)
In the case of robots.txt, these sanctions can very well be court ruling against you
This is especially important with regard to services which mirror webpages. Doing so without the (assumed) consent of the author is a straightforward copyright violation
Even if you don't fear the legal system, disregarding robots.txt can quickly ge
Re: (Score:3, Informative)
How about we have a look what the RFC-drafts (its not even official) say about robots.txt:
"Web site administrators must realise this method is voluntary, and is not sufficient to guarantee some robots will not visit restricted parts of the URL space."
"It is not an official standard backed by a standards body, or owned by any commercial organisation. It is not enforced by anybody, and th
what your government DOESN'T want you to know (Score:2)
think about it-- anything on this list IS NOT on google..
why???
Autolawyers (Score:4, Insightful)
If Congress were serious about keeping the US economy "safe and effective", it would reform the "lawyers' job security" laws. Instead it will surely make them even worse, and make the lawyer tax on technology mandatory.
Re: (Score:3, Insightful)
Re: (Score:2)
"Unless lawyers are paid by the state, like doctors in Canada, they cannot be considered officers of the court who's job it is to represent your rights before said court. Once they accept payment from a client, either actual or pending, they become no more [than] hired sales consultants [peddling] their [clients'] version of the truth."
Second, there is no distinction between being an advocate for a client's version of the truth, and being an advo
Re: (Score:2)
Re:Autolawyers (Score:5, Insightful)
There's probably a way to ensure that lawyers represent people's rights better than they do now. Regular random audits of billings and practices. More "contempt of court" punishment. More suspended/revoked licenses, especially for repeated frivolous representation. More "malpractice" awards. There ought to be more competition, with more standardized reviews contextualizing all those "scores", published for consumers.
Lawyers even more than doctors hide behind consumer ignorance and blind "respect". Exposing their performance as part of the shopping process would make them more competitive, and better adhere to the required "ethics" that usually are assumed to come with the tie.
Re: (Score:2)
Re: (Score:3, Interesting)
Yes, a more prolific lawyer should be more likely to be audited. Probably every nth case (by all lawyers) should have an audit initiated secretly to follow the proceedings, reporting malpractice as it's observed, so corrections aren't applied only after the case is derailed. That does
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
FWIW, I'm not really "a liberal", but I did notice that more Conservative justices overturn Congress more than less Conservative justices [blogspot.com]. Which makes calling them "Conservative" ironic, and makes the Conservative
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
I wasn't trying to make a republican democrat statement I was trying to cover the old addage of
"Conservative" = Likes things the way they are.
"Liberal" = Likes to try new things.
Re: (Score:2)
Re: (Score:2)
I don't think I said that. I keep trying to make neutral statements, and I keep getting attacked.
I guess I just learned lesson one in Internet, don't bother arguing a neutral position against someone who obviously has an axe to grind.
Re: (Score:2)
Re: (Score:3, Informative)
I don't see that happening any time soon -- http://www.yourcongress.com/ViewArticle.asp?artic
Re: (Score:2)
Re: (Score:2)
Don't need no Wayback (Score:5, Funny)
Re: (Score:3, Funny)
Quick! Get those people some Rounded Corners and Gradients!
Welcome to Web 2.0!
Re: (Score:2)
But the site doesn't even look midly professional. I could have made that page back in high school, and I SUCK at web design.
Re: (Score:2)
Re: (Score:3, Insightful)
Inital impressions go a long way. It may seem silly to some people, but in buisness it can mean the difference between people taking you seriously and buying your product, or not.
Re: (Score:2)
Go Slashdot!
I sense a little two-faced opinion here (Score:5, Insightful)
So by the logic, if I didn't want AOL to release my search information I shouldn't be mad as it's my fault to have used them in the first place? Or that if I want my copyrighted information to not be republished by someone else, I should just simply not publish at all? How about, if I don't want my GPL code resold by someone in a closed source product I should just know better and not put it out in the open to begin with. And that if I post something stupid when I'm 9 we believe it should follow me around throughout my entire lifetime, because a 9 year old should know better.
Re: (Score:2)
the way back thing always told you when it was.. never trying to show it off as now
Re: (Score:2)
I sense a collection of poor analogies here (Score:2, Interesting)
You never intended to make your search results publicly available. These guys intentionally made their web page publicly available.
Or that if I want my copyrighted information to not be republished by someone else, I should just simply not publish at all?
That's a better point, but the question is whether the Wayback Machine "republished copyrighted material". If t
Re: (Score:2)
It is more similar to releasing it as public domain code, then someone puts it in a commercial product, then you change your mind and re-release it as GPL, then you sue the people who made the commericial product. And you should lose that case.
Re:I sense a little two-faced opinion here (Score:5, Interesting)
Another example: someone I know wrote an essay that he thought only people in his class would ever see. It contained one or two mildly embaressing disclosures, not terribly personal, but not something you'd want a complete stranger to know about you. Some idiot put it up on the school web site without his permission.
Here's a nasty possibility. Suppose somebody unintentionally publishes information useful to terrorists. DHS drops by and points out the error, and the information is withdrawn. Does Wayback Machine have a right to keep the information online?
In fact, Wayback Machine has never asserted their right to keep anything online. As the article points out, they'll remove stuff that's noncompliant with the current robots.txt, even though it was compliant at the time it was spidered. This lawsuit wasn't about their right keep stuff online. It was just somebody accusing them of being negligent about enforcing their own policies.
Re: (Score:2)
I want to remove archives of my websites for hostnames/domains that are no longer connected to the internet. Obviously, the robots.txt method cannot work here.
Re: (Score:2)
Re: (Score:2)
Why don't you just play the child pornography card instead? At least that's *illegal*, unlike putting publicly available information online instead of hidden in some dusty library gaurded against terr'ists by a librarian.
The fact is, if something is actually illeg
Re: (Score:2)
Who said anything about putting publically available information online? It might, for example, be private information about a building that makes it easier to blow it up. "Our new death star has a state of the art venting port, located for easy access at ..."
It's funny that you accuse me of bad faith, since you're lumping me in with the Bush administration's crazy attempts to control information. I didn't say anything about censorship. I simply pointed out that a web site can have legitimate reasons for
erasing history (Score:2)
I really hate that. When I want to find some info about some hardware made by a long-defunct company, I find old usenet posts referencing their website, This is now taken over by some scumbag who has filled it full of porn and viagra ads. I go to the Wayback Machine and find ALL the
George? Is that you? (Score:2)
Re: (Score:2)
Your information is useful to people. Terrorists are people. Therefore, your information is useful to terrorists.
Therefore, you need to refrain from posting any information that is useful to people. Therefore, Slashdot is OK.
Re:I sense a little two-faced opinion here (Score:5, Insightful)
If you post something on the net then I can point my browser to it - there is no privacy, and nor was there any expectation of it. I could have used wget -r -erobots=off on your page every day and got all its content - and I'd have that archive even when you deleted it or moved it into some private archive, and it happily ignored your robots.txt. Since obeying robots.txt is volutary I simply chose not to.
News websites often want you to pay to for older content but there is nothing theoretically stopping you from saving all the content day by day. You are comparing apples and oranges.
Heres the summary - we posted evidence online that was used against us in a court of law, we lost, we sued the people who provided that evidence, and because its cheaper to settle than deal with bloody lawyers we settled with them.
Re: (Score:3, Interesting)
if you give private information to AOL and they release it publicly then you can get upset
if you post private information on "check-out-my-ssn.com" and its public to the whole world then you can't get mad.
Re: (Score:3, Insightful)
Its purpose is not to censor information but to avoid incident by agressive robots that could stress WWW servers (introduction in the first link).
HA action is revisionism. Like a politician yelling something then a few years later claiming he never said such a thing and threatening people with a piece of evidence to the contrary.
If you don't want it read... (Score:4, Insightful)
People shouldn't put anything on the Internet that they wouldn't want their worst enemy, boss, NSA, or grandmother to see. Obviously since the porn industiry exists online, few people follow this rule, but it's a good one none the less.
I enjoy Archive.org and when I get nostalgic about my websites of the past, it's there to show me a glimpse into history.
Re: (Score:2)
It might be fairly secure... But its on the web. Point is everything will eventually be on the web, its only a matter of do you trust the security of the site. Should you trust the security of myspace? No..
Re: (Score:2)
Lack of real information security is the trade we made as a computerized networked society, for convenience in banking. With the effort saved in banking I'd say it's worth it, even with the potential identiy scams the plague thousands of people every year. Crime happens whether it's online or off.
Re: (Score:2)
metaphorically speaking (Score:2, Troll)
For the life of me I can't figure out what ringing a bell and publishing something online have in common. Maybe if we didn't use digital clocks we could turn back the sands of time and use a different mixed metaphor instead?
Re: (Score:3, Informative)
Re: (Score:2)
Re: (Score:2)
[Curmudgeon]Un-ring? Bah! Nonsense.[/Curmudgeon]
But.... (Score:2, Informative)
Retroactive robots.txt (Score:5, Insightful)
First, some background. I have a weblog I've been running since 2002, switching from B2 to WordPress and changing the permalink structure twice (with appropriate HTTP redirects each time) as nicer structures became available. Unfortunately, some spiders kept hitting the old URLs over and over again, despite the fact that they forwarded with a 301 permanent redirect to the new locations. So, foolishly, I added the old links to robots.txt to get the spiders to stop.
Flash forward to earlier this week. I've made a post on Slashdot, which reminds me of a review I did of Might and Magic IX nearly four years ago. I head to my blog, pull up the post... and to my horror, discover that it's missing half a sentence at the beginning of a paragraph and I don't remember the sense of what I originally wrote!
My backups are too recent (ironic, that), so I hit the Wayback Machine. They only have the post going back to 2004, which is still missing the chunk of text. Then I remember that the link structure was different, so I try hitting the oldest archived copies of the main page, and I'm able to pull up the summary with a link to the original location. I click on it... and I see:
Excluded by robots.txt (or words to that effect).
Now this is a page that was not blocked at the time that ia_archiver spidered it, but that was later blocked. The Wayback machine retroactively blocked access to the page based on the robots.txt content. I searched through the documentation and couldn't determine whether the data had actually been removed or just blocked, so I decided to alter my site's robots.txt file, fire off a request for clarification, and see what happened.
As it turns out, several days later, they unblocked the file, and I was able to restore the missing text.
In summary, the Wayback Machine will block end-users from accessing anything that is in your current robots.txt file. If you remove the restriction from your robots.txt, it will re-enable access, but only if it had archived the page in the first place.
Re: (Score:2)
That's pretty cool. I wish more software behaved in a manner that well thought out.
Re: (Score:2)
That's uncool.
Re: (Score:2)
You know the type:
and of course, robots.txt was
What REALLY pisses me OFF (Score:5, Insightful)
After a certain domain was no longer in use for years some adware search rank linkpharm whatever it is added a robots.txt file to a "hijacked" domain.
One can now get formerly accessible sites removed from archive.org. EVEN IF THE ORIGINAL OWNER NEVER INTENDED TO.
Check out their robots.txt... (Score:3, Interesting)
Re: (Score:3, Interesting)
Wayback Machine essential for public domain (Score:4, Interesting)
As more content moves online, the idea of publishing a work becomes blurred. Revisions years later can effectively update the copyright of the work, if the reader cannot distinguish when the content was created. So the Wayback Machine will hopefully provide that resource. The amount of potentially public-domain content there is huge.
As a side note, it will be interesting to note when the first GPL programs (for example) lose their copyright. Of course, by then, the languages will seem more than archaic.
Re: (Score:2)
Re: (Score:2)
Right, I was operating under that assumption. Therefore, it is very important that we have a record of what existed at a given point in time.
What I don't know for certain is the answ
Isn't ignoring robots.txt unauthorised access? (Score:2)
In the UK Computer Misuse laws, there is the concept of unauthorised access. It is an offence to access data on a computer system without authorisation.
Typically it is assumed that access to data held on a publicly available website, without notice to the contrary, is authorised. A notice displayed stating that you should
Re: (Score:2)
That sounds rather absurd. It's like posting a massive page of text in a busy public location, with a sticky note attached saying "do not read this text."
I would think that in terms of computer networks, "unauthorized access" means breaki
Re: (Score:2)
The IA does exactly that -- it respects robots.txt. Further, it RETROACTIVELY applies robots.txt. Now, this may not work (which is what the complaint was about). And AFAIK the retroactive edit doesn't remove data, it simply doesn't allow visibility (which is one of the reasons it may not work -- if there are two separate paths to the data, and the data is there, it can still be retrieved).
The devils advocate argument would be that IA may be necessary to retai
Does anyone here know what copyright is?! (Score:3, Insightful)
Pretty much every time we have a discussion about the legality of web/Usenet archive sites, the only argument with any legal weight that's given for what would otherwise be a clear infringement of copyright is that the rightsholder is implicitly consenting to certain uses by making the material available on that medium. The degree to which this holds in general is debatable, and AFAIK has never been tested in any major court case in any jurisdiction. However, even if robots.txt is voluntary, it's a clear statement of intent. There is no way you can claim implicit permission to copy the material when the supplier explicitly indicated, using a recognised mechanism, that they did not want it copied.
That makes comments like this one by Doc Ruby [slashdot.org] and this one by saskboy [slashdot.org] seem a little presumptuous, IMNSHO.
HA by law should have to give up the data (Score:2)
Violated their Own Policies (Score:2)
Put in a robots.txt.
Direct wayback to index what you want or dont.
THAT DIRECTION IS APPLIED TO FILES ON THEIR SITE FROM PREVIOUS VERSIONS.
Meaning, if you deny all, and their bot sees it, all of your stuff is supposed to get deleted from the archive.
If they didn't do that they violated their own policy.
True, there can be complications (such as switching domain names) that might keep any given text in there wit
What about robots.txt in/from the future? (Score:2)
It may still be voluntary today, but who knows what the future will bring?
I, for one, welcome our robot.txt overlords.
wrong (Score:2, Interesting)
The robots.txt file is a clear indication of the conditions under which a copyright holder gives you access to their copyrighted materials. As such, it is not "voluntary".
In addition to probably being in violation of copyright law, it is simply rude for companies to ignore robots.txt files; if the Internet Archive does this, they are badly behaved.
If courts should decide that robots.txt files can be ignored at will, th
Wrong, wrong, wrong (Score:4, Informative)
Wrong, wrong, wrong. archive.org explicitly tells you that if you want your content removed from their index, that you should modify your robots.txt and re-submit your site, and when their bot reads your robots.txt and sees the appropriate directives, your content will be dropped from the index. See:
http://www.archive.org/about/faqs.php#2 [archive.org]
http://web.archive.org/web/20050305142910/http://
Let's review the text here, just in case someone from archive.org scurries to change it:
Addendum: An Example Implementation of Robots.txt-based Removal Policy at the Internet Archive
By not honoring those directives, are they not engaging in both copyright infringement and fraud?
Re: (Score:2)
Would their robot obey or ignore the directives when crawling archive.org?
Re: (Score:2)
... I could make it so you were never born. (Score:4, Interesting)