White House CIO Describes His 'Worst Day' Ever 333
dcblogs writes "In the first 40 days of President Barack Obama's administration, the White House email system was down 23% of time, according to White House CIO Brook Colangelo, the person who also delivered the 'first presidential Blackberry.' The White House IT systems inherited by the new administration were in bad shape. Over 82% of the White House's technology had reached its end of life. Desktops, for instance, still had floppy disk drives, including the one Colangelo delivered to Rahm Emanuel, Obama's then chief of staff and now Mayor of Chicago. There were no redundant email servers."
OMB IT has their hands tied. (Score:5, Interesting)
Re:Floppy Drives! (Score:5, Interesting)
OK, perspective is called for - Obama took the White House in 2009, up until 2009 HP had floppy drives STANDARD on business desktops - so as Obama took the White House, HP was still shipping floppy drives as STANDARD.
Yes, sitting in 2012 we can all agree that floppy drives have been obsolete for years, but in 2009 HP was still shipping them as standard.
The note about Dell Dimensions is nice, but those are "home" computers, not "professional".
And that 6 year-old software? I can guarantee you it was Office 2003 - sure, as Bush was preparing to leave office his staff certainly could have gone around and upgraded everyone to the latest/greatest version of office (Office 2007), but it is now 2012, and the latest version of Office on PCs is 2010 - does that have 100% market penetration, or are there a few stragglers on 2007 or even 2003?
Maybe, like most office users at the time, the Bush White House wasn't a big fan of the ribbon interface introduced in Office 2007 [wikipedia.org]
Re:Not a bad number (Score:4, Interesting)
Gates: We need five nines of uptime
Ballmer: Engineering, we need 9 + 9 + 9 + 9 + 9 uptime.
Engineering Manager: Guys, our uptime goal is 45%
Engineering: We already deliver about 72%.
Engineering Manager: Steve, we actually have 9 + 9 + 9 + 9 + 9 + 9 + 9 + 9 uptime!
Ballmer: Bill, we're so stable we have 8 nines of uptime! Let's see our competitors beat that!
Gates: Great Steve, let's add some more bloat and see if we can bring that number down some so we leave ourselves with room for improvement.
Here's the real story first hand (Score:5, Interesting)
This article is partially correct but leaves out the actual technical issues involved.
Someone *from* that Datacenter here at that time. Here's what really happened.
The old administration did not care about the existing IT infrastructure because they were on their way out. They wanted no changes made- just that things be left up. Yes the email system was old and past EOL, but the outages were really the perfect storm of everything that could hit the fan actually hitting the fan at the same time.
The facility was doing work on the power system- the UPS to be specific. Somewhere along the line they messed up, and cut the power. *All* of the power. Datacenter goes dark. They brought the power back up, but then tripped it again before bringing it up for good. This detail is what caused the weekend of hell.
The SAN that the clustered email servers (yes, clustered, they *were* redundant) had the stores on was an EMC Symmetrix. It has a built-in battery backup system so that if the SAN looses power it has enough stored to flush the cache to disk. The power going off started this process. The power going back on triggered the response to stop flushing the cache and start checking and rebuilding. Then the power went off again. This is the part where the specific details get hazy but in effect the SAN did not like this. I don't believe it had enough power to totally flush the cache and/or it did not have the logic built in to handle an outage while in recovery mode. The result was a downed SAN that *would not come back up*. Now all of the data was down and nothing could be done but wait for the vendor to show up and try to fix it.
At the same time we were dealing with *every* server being off and having to come back up. There were hundreds. Luckily most did. Some did not. Some were important, such as in the case of *both* the servers in a clustered system that would not boot- which just so happened to be the system that some of the say "more important" VIPs were on. These were old systems running Exchange 2000 on Windows 2000. Long past due, but kept up by the staff since the EOP would not approve a new email infrastructure.
Eventually the systems would be restored and everything would be back on-line. In the meantime though Brook thought it would be a good idea to spend untold amounts of money to bring in MS Engineers to look things. They cost a lot of money and made a bunch of reports but they didn't fix a damn thing. The staff that was already there found the issues with the servers and fixed them.
There were later headaches, such as when mentioned that the Sonnet was cut (thanks Verizon!) and further SAN maintenance but that was the weekend from hell.
Things to note: