Forgot your password?
typodupeerror
United States The Internet Your Rights Online

White House Website Limits Iraq-Related Crawling 837

Posted by simoniker
from the intriguingly-specific dept.
oscarcar writes "Dan Gillmor is reporting on the White House website's use of its robots.txt file to disable search engines from crawling certain material. Many excluded items in the robots.txt file involve mentions of Iraq, possibly to prevent people from finding changes to past statements and information when archived elsewhere."
This discussion has been archived. No new comments can be posted.

White House Website Limits Iraq-Related Crawling

Comments Filter:
  • Funny (Score:5, Funny)

    by sulli (195030) * on Monday October 27, 2003 @04:25PM (#7322161) Journal
    whitehouse.com doesn't have that problem.
    • Re:Funny (Score:2, Funny)

      by Anonymous Coward
      neither does gwbush.com [gwbush.com]

      We have to give him credit for believing in the U.S. values enough not to shut the site down.

  • by Armethius (718200) <`jtunnell' `at' `utk.edu'> on Monday October 27, 2003 @04:26PM (#7322169)
    possibly to prevent people from finding changes to past statements and information when archived elsewhere
    Or possibly not...
    • better explanation would be?
      • Re:And your ... (Score:5, Insightful)

        by SQL Error (16383) on Monday October 27, 2003 @04:55PM (#7322542)
        Better explanation: Someone screwed up a search-and-replace in a major way. Many (most?) of those pages with "iraq" in them don't exist.

        It looks like someone blocked off parts of the site to web-crawlers; I don't know for sure why all those blah/bloo/iraq entries are in there but they sure as hell don't lead to anything.

        Censorship: 0
        Screwups: 100
        • re: and your ... (Score:4, Insightful)

          by ed.han (444783) on Monday October 27, 2003 @05:01PM (#7322604) Journal
          what's that old saying? "never attribute to malice that which can be attributed to stupidity" or something like that?

          let's not get reactionary here, folks. it wouldn't make sense to do what's being alleged:

          1. every major journalist worth his/her salt would be all over it within hours. so it wouldn't succeed in obscuring information.

          2. it would create an incredible backlash as soon as detected. what purpose would this serve?

          ed
          • Re: and your ... (Score:4, Insightful)

            by AllUsernamesAreGone (688381) on Monday October 27, 2003 @05:17PM (#7322787)
            "1. every major journalist worth his/her salt would be all over it within hours."

            Don't be naive. How long do you think that any mainstream journalist who made a story of this would have a job for? The answer - not long. The US media in particular, although the UK is getting as bad, is little more than a relay system for government propaganda and real, detailed, complete examination of government behaviour, with equal air time to truly dissenting opinions (how many times has Chomsky been on CNN in the past 4 months?) is out of the question. What the government does is Good and Right and Should Not Be Questioned.

            Media by the elite, serving the elite.
          • Re: and your ... (Score:5, Interesting)

            by Zeinfeld (263942) on Monday October 27, 2003 @05:52PM (#7323140) Homepage
            every major journalist worth his/her salt would be all over it within hours. so it wouldn't succeed in obscuring information.

            Where have you been living the past five years? Journalists don't criticize Bush.

            They still have not published the fact that he deserted from the national guard during Vietnam and they practically ignored his DUI conviction.

            The GOP has the media cowed with their constant 'liberal media' babble. There number of journalists who are prepared to hold Bush to account is tiny - Krugman, Conanston, Irvins, Alterman. After that its Al Franken, Jon Stewart and David Letterman.

            it would create an incredible backlash as soon as detected. what purpose would this serve?

            The chances that the mainstream media will pick this one up are very small. Just think how they would have reacted if it was Clinton!

  • upside (Score:5, Funny)

    by 514x0r (691137) on Monday October 27, 2003 @04:26PM (#7322171)
    it's good to see the whitehouse embracing technology so much.
  • by rot26 (240034) * on Monday October 27, 2003 @04:27PM (#7322178) Homepage Journal
    Many excluded items in the robots.txt file involve mentions of Iraq, possibly to prevent people from finding changes to past statements and information when archived elsewhere."

    Maybe, but I would think they might also be looking for "shady" spiders that ignored robots.txt. I wouldn't be surprised if there aren't a few honeypot pages in there too.
    • by RobertB-DC (622190) * on Monday October 27, 2003 @04:36PM (#7322300) Homepage Journal
      Maybe, but I would think they might also be looking for "shady" spiders that ignored robots.txt. I wouldn't be surprised if there aren't a few honeypot pages in there too.

      Oh, crap. I just plugged in /firstlady/images/iraq [whitehouse.gov], and now you tell me I'd better watch out. Damn this static IP address!

      Quick, Slashdot that link before the Agents get to my cube!
    • by sketerpot (454020) <sketerpot AT gmail DOT com> on Monday October 27, 2003 @04:39PM (#7322350)
      Honeypot or not, look at robots.txt. It's creepy: just about every entry is an Iraq-related page, and there are a lot of entries. If they wanted to just have a few honeypots, that shouldn't involve that many entries, or so many with the common theme of Iraq.
  • Queue somebody... (Score:5, Insightful)

    by Dave2 Wickham (600202) on Monday October 27, 2003 @04:27PM (#7322180) Journal
    Queue somebody to take a crawler (hell, even a bash script using wget) to specifically archive these pages. Hell, they could even use a user-agent which doesn't look like a bot.

    Of course, people would be less likely to trust random-Joe from the Internet than, say, The Wayback Machine, but I expect this is what will happen...
    • Re:Queue somebody... (Score:3, Interesting)

      by macshune (628296)
      I found the original code on usenet, modified it and left the original french comments in. Heh, originally they made the referer the cia to scare unsuspecting webmasters. silly french:) this could easily be made to cycle through the robot.txt file, but i don't have the time right now, i'm in lab:)

      #!/usr/bin/perl -w

      use strict;

      use LWP::UserAgent;

      my $ua = new LWP::UserAgent;

      $ua->agent("Mozilla/4.0 \(compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.1.4322\)"); # super browser !

      my $req = new HT
  • Careful (Score:2, Funny)

    by BamaSlam (78998)
    Or you'll tear his tinfoil hat and then the black helicopters will be able to find him again.

    Nugs
  • by MechCow (561875) on Monday October 27, 2003 @04:30PM (#7322210) Homepage
    There doesn't seem to be anything big about this. I understand the origins of the robots.txt file were about keeping robots out of infinite loops and unimportant large file trees, but everyone knows they are also used to prevent google from indexing stuff people would rather keep (semi) private.

    If this was some crazy government conspiracy and they were trying to hide the information, why would they put it on their website? Could be any number of reasons they have done this perhaps they were getting loads of hits from google about iraq related things but if anyone really wants the information surely they can just visit it.

    • by jpetts (208163) on Monday October 27, 2003 @04:55PM (#7322541)
      If this was some crazy government conspiracy and they were trying to hide the information, why would they put it on their website? Could be any number of reasons they have done this perhaps they were getting loads of hits from google about iraq related things but if anyone really wants the information surely they can just visit it.

      Actually, the motivation around this could be to prevent caching of the documents, so that it's not so easy to compare differently dated versions of the same document. See this piece at Caltech [caltech.edu] for an example of how things change with time.
    • everyone knows they are also used to prevent google from indexing stuff people would rather keep (semi) private.

      The US government has no buisness with semi-private material. Either don't put it on the website, or make it publicly available to everyone, including Google and friends.
  • by Lizard_King (149713) on Monday October 27, 2003 @04:30PM (#7322213) Journal
    Disallow: /president/spongebobsquarepants_archive

    I didn't know gee-dub likes SpongeBob too! My nephew is gonna flip out when he hears this.
  • by mcc (14761) <amcclure@purdue.edu> on Monday October 27, 2003 @04:30PM (#7322217) Homepage
    Perhaps their goal is simply so that when people google or whatnot for information on the Bush Administration and Iraq, they will be likely to find the Bush Administration's current views on and actions in Iraq, rather than outdated material?

    Completely ignoring for the moment the fact that these views and actions are really somewhat embarrasing for the Bush administration, this really makes sense from a practical viewpoint. Few things are as annoying as searching for something news-ish and finding primarily material from two years ago. And after all, if they ONLY were interested in people forgetting the old materials, they could have just removed those materials from the site totally. (Though perhaps they were aware removing the materials completely would cause mirrors, which would be fully searchable, to spring up.)
    • But rather than preventing the search of this information, why not mark it as such? In fact, I'll bet it's already dated per page.

      I agree that this is yes another step in the misinformation campaign surrounding the current administration. The policies that we've heard flip through hoops like trained seals. There's just no logic to all the reversals of focus, the "misquotes" and the public snafus we've seen happen. This is just another one of them.
    • It is rather emberassing to find another view of your own opinions in google cache... lol...

  • If you're surprised by this, THAT's the news, not what the White House is doing with this information control. Click here [missouri.edu] for a list of the White House's policies with restricting FOI and other related requests since Sept 11th.

    This isn't partisan politics, either. The Republican party has been trying to keep Bush from violating the Presidential Records Act.

    Yes, yes, the country's at war. Makes you wonder why Bush doesn't want anybody to know about communications between Reagan and his advisors.
    • by Barbarian (9467)
      There hasn't been a real declared war since WWII. You can't "declare war on terrorists" and be done with it either, wars are supposed to be declared on countries when you go to fight them. It was what an honorable nation would do before hostilities.
      • by flossie (135232) on Monday October 27, 2003 @05:27PM (#7322903) Homepage
        An honourable country would not keep people imprisoned in Guantanamo Bay without either giving them PoW status or charging them with a specific offence and giving them the right to a fair trial, including free, unhindered and unmonitored access to legal counsel.
        • To be a POW you have to have been captured wearing a recognisable uniform, and be part of an established fighting force of a government.
          I suspect that many of the people captured met neither condition.
          • by flossie (135232) on Monday October 27, 2003 @07:27PM (#7323933) Homepage
            To be a POW you have to have been captured wearing a recognisable uniform, and be part of an established fighting force of a government. I suspect that many of the people captured met neither condition.

            In which case they should be charged with something, either spying (unlikely if they were in their own country) or something else. They should then have the opportunity to defend themselves in open court with the ability to avail themselves of all the rights guaranteed by the Universal Declaration of Human Rights [un.org] which the US has signed up to. If US soldiers in Britain arrested me, I would not be wearing a recognisable uniform because I am not part of the military or any recognisable fighting force of government. That does not give them the right to forcibly remove me from my home country and lock me up without ever even charging me with anything! The actions of Bush and his cohorts in the Whitehouse are absolutely disgusting.

            • While I agree that by now there ought to have been more transparency by the US govt regarding the Guantanamo Bay detainees by now, if you had SHOT at U.S. soldiers in an engagement in Great Britain, you'd be an illegal combatant. This is pretty much why these people are being detained.

              These people were probably by and large draftees, which unfortunately in Afghanistan, meant they weren't going to _get_ a uniform. They certainly have a right to public trial, but by and large they were probably arrested legi
        • by jgardn (539054)
          An honorable country would not pack 19 men onto airplanes to crash into civilian buildings, trapping the people inside to choose between a burning hell or a jump to certain death.

          Damn the terrorists to hell! I pray to God that He will strike all those who think like the terrorists down, and thrust them into the deepest recesses of hell. How can He be a God of Justice and Love if He allows this kind of crap to go on unpunished? They are not honorable, and they should feel DAMN lucky we didn't go and slaught

      • > There hasn't been a real declared war since WWII. You can't "declare war on terrorists" and be done with it either, wars are supposed to be declared on countries when you go to fight them.

        Also, US wars have to be declared by the Congress rather than by the White House... or at least that's the way it worked back when the Constitution still meant something.

  • Seriously though... (Score:4, Interesting)

    by MyNameIsFred (543994) * on Monday October 27, 2003 @04:31PM (#7322232)
    ...possibly to prevent people from finding changes to past statements and information when archived elsewhere...

    While anything is possible in politics, is it possible that the web admin is trying to limit the amount of traffic on the site? Is it possible that his analysis of the weblogs show a lot of traffic from robots looking for Iraqi-related info?

  • by burgburgburg (574866) <splisken06NO@SPAMemail.com> on Monday October 27, 2003 @04:31PM (#7322233)
    or even considering that previous statements might not match current statements means that the terrorists win. The WH Ministry of Truth works hard to ensure that the spin for the day gets out to the party faithful above the filters of "news" with their "facts" that don't gibe with the message we're trying to deliver.

    If you persist in contemplating a world where whatever statements that the WH puts out, no matter how they might seem to contradict previous statements, are not totally true and correct, then a relocation expert from Guantanamo will be by in a few minutes. Just step away from the computer.

  • All of the now non-spidered pages can be located in Room 101.
  • Everything Iraq.... (Score:5, Informative)

    by c_oflynn (649487) on Monday October 27, 2003 @04:31PM (#7322238)
    It looks like 99% of the stuff related to Iraq is filtered out in robots.txt.

    But not a problem, on google.com I just specify the site by saying 'Iraq site:whitehouse.gov' and it had 14,000 hits... the first one is the root of /infocus/iraq directory (which is dissallowed in robots.txt)
  • by Dlugar (124619) on Monday October 27, 2003 @04:33PM (#7322258) Homepage

    Obviously, they're keeping people from accessing the top-secret teeball Iraq files [whitehouse.gov] ! Besides:

    Disallow: /teeball/iraq/
    check out these other frightening examples of censorship:
    Disallow: /kids/spotty/iraq

    Disallow: /kids/eggroll/iraq
    Disallow: /kids/barney/iraq
    Disallow: /easter/iraq
    Disallow: /mrscheney/iraq
    Disallow: /national-anthem/iraq

    Truly frightening.
    • Sssshhh! The terrorists will see this!
    • by Barbarian (9467)
      There could be 10 lines in that whole file designed to prevent pages being archived, and the rest are garbage thrown in for confusion/as bad-robot honeypots.
    • by mykepredko (40154) on Monday October 27, 2003 @05:09PM (#7322689) Homepage
      Downloading the "robot.txt" file and doing a quick ctrl-f on different words, I discovered that there are six instances of "Barney" coming up in the robot.txt:

      Disallow: /holiday/2002/barney/iraq
      Disallow: /holiday/2002/barney/text
      Disallow: /kids/barney/iraq
      Disallow: /kids/barney/text
      Disallow: /kids/photoessays/barney/iraq
      Disallow: /kids/photoessays/barney/text

      Which is the same number as "cheney", "powell" had 4, "saddam" didn't have any and "bush" only comes up with "bushpets".

      Clearly, there is something to do with Barney and Iraq that The White House doesn't want you to know about.

      myke
  • Consider the fact that GW Bush has banned media (hello?? freedom of the press? 1st Amendment??) coverage of returning killed soldiers. Why? Because seeing dead soldiers makes people realize that the war is real and people are dieing.

    The current administration is trying its damndest to control infomation that it doesn't like
  • by wardomon (213812) on Monday October 27, 2003 @04:34PM (#7322269)
    welcome our White House Robot Overlords. It would be funnier if it weren't true.
  • by release7 (545012) on Monday October 27, 2003 @04:35PM (#7322276) Homepage Journal
    From 1984:

    Winston's greatest pleasure in life was in his work. Most of it was a tedious routine, but included in it there were also jobs so difficult and intricate that you could lose yourself in them as in the depths of a mathematical problem -- delicate pieces of forgery in which you had nothing to guide you except your knowledge of the principles of Ingsoc and your estimate of what the Party wanted you to say. Winston was good at this kind of thing. On occasion he had even been entrusted with the rectification of the Times leading articles, which were written entirely in Newspeak. He unrolled the message that he had set aside earlier. It ran:

    times 3.12.83 reporting bb dayorder doubleplusungood refs unpersons rewrite fullwise upsub antefiling

    In Oldspeak (or standard English) this might be rendered:

    The reporting of Big Brother's Order for the Day in the Times of December 3rd 1983 is extremely unsatisfactory and makes references to non-existent persons. Rewrite it in full and submit your draft to higher authority before filing.

  • by Captain Morgan (160029) <cmorgan&alum,wpi,edu> on Monday October 27, 2003 @04:35PM (#7322279) Homepage
    It could be something innocent but really, why would anyone want to keep search engines out of a publicly funded website? People have been accusing the poster of "baseless accusations" but the guy does have a point. I've seen a couple of GW's speeches and afterwards the transcripts of those speeches and noted that gramatical errors were corrected. While this is only a minor offence in editing history it does make you wonder what other opinions and information may have appeared and then later have been edited. Seriously, these are our government officials here, we deserve to have an unedited record of what they say and to hold them to it. A little bit of speculation on the reasons for excluding various terms is far from paranoia.

    Chris
  • From the robots.txt file:

    Disallow: /easter/iraq

    Does this mean they're going to ban Christmas in Iraq too?
  • related links (Score:5, Interesting)

    by js7a (579872) * <[gro.kivob] [ta] [semaj]> on Monday October 27, 2003 @04:35PM (#7322288) Homepage Journal
    A couple of web sites that (1) have in the past done a great job of catching these kind of things, and (2) have mailing lists you can subscribe to:

    Here's a minor example of something those two sites didn't catch: Remember Iraq's so-called "mobile biological weapons factories" [fas.org]? A month after the story broke that they were for weather balloons [slashdot.org], the CIA moved their report's URL [informatio...house.info].

    An intriguing fact about this whitehouse.gov/*/iraq thing is that they do in fact cover some of the important statements [bway.net] which are apparently not duplicated in the press release, conference, and briefing directories. Perhaps there was a "unique urgency" to cover up some poor choices of words?

  • by sipping some Victory! gin and smoking some Victory! cigarettes.

    wow, a webmaster changed his robots.txt. i'm amazed.
  • From the robots.txt file:

    Disallow: /kids/barney/iraq

    Thank goodness they're limiting the export of that blasted purple dinosaur!
  • by Quasar1999 (520073) on Monday October 27, 2003 @04:38PM (#7322328) Journal
    You found it didn't you? It failed... congratulations, you have somehow circumvented the government's website security system, prepare for the wrath of the DMCA, backed by none other than Bush himself!

    Well either that, or it's simply preventing search engines from indexing honeypot type pages used for mis-information... Either or... but I like the first version... since it's more paranoid, and I have plenty of tinfoil ready to be shaped into hats... ;)
  • Many excluded items in the robots.txt file involve mentions of Iraq, possibly to prevent people from finding changes to past statements and information when archived elsewhere."

    Or maybe, just maybe, they're doing it to save their server from being constantly crawled by paranoid conspiracy-theorists looking for changed statements and information.

  • SHHH!!! (Score:2, Funny)

    by jpmahala (181937)
    Goodness knows we can't have googlebots archiving all of those top-secret/confidential web pages at the whitehouse [whitehouse.gov]. I guess we'll just have to live with the top-secret info that has already been archived.

    What's that? Oh, all of the real top-secret stuff is at the NSA website [nsa.gov]?

    Never mind then.

  • by Have Blue (616) on Monday October 27, 2003 @04:41PM (#7322367) Homepage
    If you try actually *loading* the directories listed in the robots.txt [whitehouse.gov], they don't exist. Not one. Not by going to their index.html or trying to find them through the site navigation. While they could still be accused of deleting them, many of the links are unlikely to have existed in the first place (http://www.whitehouse.gov/president/heartland-tou r-gallery/iraq? /president/holiday/decorations/iraq? /president/tee-ball-01/iraq? ) This may be just some IT grunt running a bad script on robots.txt.
    • by borkus (179118)
      An odd webmaster choice maybe? I wonder if they generate the robots.txt based on a 404 report - something like
      • Grep the errors log for 404's from search engines.
      • Parse out the directory paths.
      • Add those to robots.txt.

      Which might explain why at least one of the directories - /infocus/iraq/ - clearly has an index [whitehouse.gov]. However, if they moved or renamed a file under that path, it might be generating 404's. From personal experience, I've had bad requests from Googlebot for files that were over 4 years old.

      I have

  • by steveit_is (650459) on Monday October 27, 2003 @04:43PM (#7322397) Homepage
    Most of the pages in the robots.txt are actually 404's and dont exist anymore. Its that simple. Keeps the robots from constantly requesting content that doesn't exist anymore. A few are blocked because they are bandwidth intensive videos and things, and some others are blocked for more mundane reasons I assume.
  • Wayback Machine (Score:3, Informative)

    by BLuP1 (641290) on Monday October 27, 2003 @04:45PM (#7322432)
    The Wayback machine does archive robots.txt, it seems like the whitehouse updates this file about every week or so. The current update happened after April 13th, 2003, and it simply took all of those references that said ".../.../.../text" and added /iraq as well.

    Seems odd and pointless to me. I'd like a statement explaining it. A lot like the "Disallow: /hidden/passwd" kind of entries.

  • by msheppard (150231) on Monday October 27, 2003 @04:46PM (#7322436) Homepage Journal
    Looks like someone just added IRAQ to all of the exsiting links. It's obviously some sort of search/replace/copy function. Go look for yourself, I found this one:

    Disallow: /firstlady/recipes/iraq

    Now, how many pages would this possibly block?

    M@
    • Looks like someone just added IRAQ to all of the exsiting links. It's obviously some sort of search/replace/copy function. Go look for yourself, I found this one:

      Disallow: /firstlady/recipes/iraq



      Soylent Green is Iraqi people!
  • by dekashizl (663505) on Monday October 27, 2003 @04:54PM (#7322525) Journal
    Before you get carried away with the Iraq issue, realize that most of these are just leftover from previous administrations:
    Disallow: /~billc/pics/nudity/hillary
    Disallow: /~billc/mysexpics/oral/monica
    Disallow: /~billc/mysexpics/mf
    Disallow: /~billc/mysexpics/mff
    Disallow: /~billc/mysexpics/mfff
    Disallow: /~billc/mysexpics/mm
    Disallow: /~billc/mysexpics/gorebjs
    Disallow: /~billc/mysexpics/goats
  • by buckminster (170559) on Monday October 27, 2003 @04:58PM (#7322569) Homepage
    It appears that this robots.txt file was probably auto-generated. It looks like someone used a script to crawl the sites entire directory structure appending /iraq and /text to every directory. In the process they seem to have created a pretty complete map of the sites underlying directory structure -- not necessarily a good thing.

    Having said that, I'm not even sure that this robots.txt file would work the way it's supposed to. Seems like these iraq references should all have a trailing slash or a .html if they're actual pages.

    Someone clearly doesn't want Google caching Whitehouse content on Iraq. The question is why? And how come they're so lame about it?

  • by MillionthMonkey (240664) on Monday October 27, 2003 @05:06PM (#7322651)
    # robots.txt for http://www.ingsoc.gov/

    User-agent: *
    Disallow: /cgi-bin
    Disallow: /search
    Disallow: /query.html
    Disallow: /help
    Disallow: /appointments/eurasia
    Disallow: /appointments/eastasia
    Disallow: /ask/images/eurasia
    Disallow: /ask/images/eastasia
    Disallow: /deptofhomeland/analysis/eurasia
    Disallow: /deptofhomeland/analysis/eastasia
    Disallow: /deptofhomeland/eurasia
    Disallow: /deptofhomeland/eastasia
    Disallow: /economy/eurasia
    Disallow: /economy/eastasia
    Disallow: /goodbye/eurasia
    Disallow: /goodbye/eastasia
    Disallow: /government/handbook/eurasia
    Disallow: /government/handbook/eastasia
    Disallow: /government/images/eurasia
    Disallow: /government/images/eastasia
    Disallow: /government/eurasia
    Disallow: /government/eastasia


    And now, an offering for the lameness filter...

    Oceania was at war with Eastasia: Oceania has always been at war with Eastasia. A large part of the political literature of five years was now completely obsolete. Reports and records of all kinds, newspapers, books, pamphlets, films, sound tracks, photographs- all had to be rectified at lightning speed. Although no directive was ever issued, it was known that the chiefs of the Department intended that within one week no reference to the war with Eurasia, or the alliance with Eastasia, should remain in existence anywhere. The work was overwhelming, all the more so because the processes that it involved could not be called by their true names. Everyone in the Records Department worked eighteen hours in the twenty-four, with two three-hour snatches of sleep. Mattresses were brought up from the cellars and pitched all over the corridors; meals consisted of sandwiches and Victory Coffee wheeled round on trolleys by attendants from the canteen. Each time that Winston broke off for one of his spells of sleep he tried to leave his desk clear of work, and each time that he crawled back sticky-eyed and aching, it was to find that another shower of paper cylinders had covered the desk like a snowdrift, half burying the speakwrite and overflowing onto the floor, so that the first job was always to stack them into a neat-enough pile to give him room to work. What was worst of all was that the work was by no means purely mechanical. Often it was enough merely to substitute one name for another, but any detailed report of events demanded care and imagination. Even the geographical knowledge that one needed in transferring the war from one part of the world to another was considerable.

    This was written in 1948. Things have really progressed!

  • by flossie (135232) on Monday October 27, 2003 @05:14PM (#7322750) Homepage
    There is a very simple explanation for this, as anyone who has read 1984 should know. In order for the glorious government to effectively serve the greater good, they need to be able to communicate changes of policy quickly and effectively. If, for instance, the enemy in a war changes, it is necessary to quickly update all documents that describe how evil the enemy are. Rather than manually editing all the documents, it is much easier to have one generic word, say "text", which can then be altered as appropriate:

    sed 's/text/iraq/g'
    sed 's/text/iran/g'
    sed 's/text/cuba/g'
    sed 's/text/belgium/g'
    etc.

    Obviously robots.txt just happened to be in the path!

  • by MoneyT (548795) on Monday October 27, 2003 @05:23PM (#7322848) Journal
    So, am I to understand that the same administration that was smart enough to rig an election, Smart enough to cause 9/11, Smart enough to forge evidence and go to war is the same administration that came up with the brilliant plan of HIDING information by putting it in a PUBLICALY availible file?
    • No, it's just the kind of subtle manipulation this administration has perfected. They probably realized that if they pulled all kinds of documents from the web site that it'd appear as if they were limiting access to the public record.

      It's all still there for all to see, but it's not as easy to find. So they can say "We're not hiding anything." while they actually hide it.

      Things that become inconvenient or embarrassing after the fact are hard to hide. At the time this quote by Dick seemed reasonabl
  • by VivianC (206472) <internet_update@yah[ ]com ['oo.' in gap]> on Monday October 27, 2003 @05:28PM (#7322912) Homepage Journal
    This really shouldn't shock anyone. It has been going on at the White House for ages. Look at this clip from the robots.txt file from 1998:
    Disallow: /history/photoessays/blueroom/blowjobs
    Disallow: /history/photoessays/blueroom/text
    Disallow: /history/photoessays/cabinetroom/blowjobs
    Disallo w: /history/photoessays/cabinetroom/text
    Disallow: /history/photoessays/crosshalls/blowjobs
    Disallow : /history/photoessays/crosshalls/temp/blowjobs
    Dis allow: /history/photoessays/crosshalls/temp/text
    Disallo w: /history/photoessays/crosshalls/text
    Disallow: /history/photoessays/diplomaticroom/blowjobs
    Disa llow: /history/photoessays/diplomaticroom/text
    Disallow : /history/photoessays/downstairscorridor/blowjobs
    Disallow: /history/photoessays/downstairscorridor/text
    Disa llow: /history/photoessays/easter/2002/blowjobs
    Disallo w: /history/photoessays/easter/2002/text
    Disallow: /history/photoessays/easter/2003/defenselink/blowj obs
    Disallow: /history/photoessays/easter/2003/defenselink/text
    Disallow: /history/photoessays/easter/2003/blowjobs
    Disallo w: /history/photoessays/easter/2003/text
    Disallow: /history/photoessays/easter/one/blowjobs
    Disallow : /history/photoessays/easter/one/text
    Disallow: /history/photoessays/easter/three/blowjobs
  • by helix400 (558178) on Monday October 27, 2003 @05:37PM (#7322996) Journal
    So, someone finds a problem with blocking search engine bots.

    1) First, a lot of these docs involve Iraq. So, wihtout real factual information, it's assumed they're trying to do something fishy regarding Iraq info
    2) Using that assumption, the next assumption is that they're purposely trying to keep people from trying to find contradictory statements.

    This could all be true, or it couldn't be. Either way, by making two assumptions without any real facts is just pathetic yellow journalism.
  • by Chanc_Gorkon (94133) <gorkon@@@gmail...com> on Monday October 27, 2003 @06:24PM (#7323459)
    Correct me if I am wrong but the data is still there right? Also, wasn't the purpose of robots.txt(that honor it) to stop crawlers from incessantly crawlign the page sapping your bandwidth? I just don't feel that this is a big issue. If they made it not searchable from the main whitehouse page, thats when I would have issues. They are just trying to save themselves bandwidth. Pages like these Iraq pages are peobably updated often. They'd be getting crawled constantly.
  • by pclminion (145572) on Monday October 27, 2003 @07:48PM (#7324125)
    So, if somebody like Google blatantly defied the robots.txt and crawled the entire site anyway, would this piss off the White House? We all know that robots.txt is a "gentleman's" agreement to not go certain places. It isn't an authentication or access control mechanism.

    Would the White House sue for violation of the robots.txt file? Under what laws could they sue? Is robots.txt an implicit grant of permission to view copyrighted content? Would GWB press the Congress for a new bill, to mandate legal enforcement of the robots.txt?

    That's probably not going to happen anytime soon, but it raises an interesting question. Is robots.txt legally enforceable? And if it was, would that be a good thing or a bad thing?

    Your thoughts?

What the world *really* needs is a good Automatic Bicycle Sharpener.

Working...