Data Mining Used Hard Drives 695
linuxwrangler writes "One hopes the /. crowd knows the perils of discarding storage with sensitive data but this article drives home the point. Two MIT grad students bought used drives from eBay and secondhand computer stores. Among the data found on the 158 drives were 5,000 credit-card numbers, porn, love-letters and medical information."
DPA (Score:5, Informative)
Full Article Text (Score:2, Informative)
JUSTIN POPE, AP Business Writer Wednesday, January 15, 2003
(01-15) 13:17 PST CAMBRIDGE, Mass. (AP) --
So, you think you cleaned all your personal files from that old computer you got rid of?
Two MIT graduate students suggest you think again.
Over two years, Simson Garfinkel and Abhi Shelat bought 158 used hard drives at secondhand computer stores and on eBay. Of the 129 drives that functioned, 69 still had recoverable files on them and 49 contained "significant personal information" -- medical correspondence, love letters, pornography and 5,000 credit card numbers. One even had a year's worth of transactions with account numbers from a cash machine in Illinois.
About 150,000 hard drives were "retired" last year, according to the research firm Gartner Dataquest. Many end up in the trash, but many also find their way back onto the market.
Over the years, stories have surfaced about personal information turning up on used hard drives, raising concerns about privacy and the danger of identity theft.
Last spring, Pennsylvania sold used computers that contained information about state employees. In 1997, a Nevada woman bought a used computer and discovered it contained prescription records on 2,000 customers of an Arizona pharmacy.
Garfinkel and Shelat, who reported their findings in an article to be published Friday in the journal IEEE Security & Privacy, said they believe they are the first to take a more comprehensive -- though not exactly scientific -- look at the problem.
On common operating systems such as Microsoft's Windows, simply deleting a file, or even following that up by emptying the "trash" folder, does not necessarily make the information irretrievable. Those commands generally delete a file's name from the directory. But the information itself can live on until it is overwritten by new files.
Even reformatting a drive, or preparing the hard drive all over again to store files, may not do it. Fifty-one of the 129 working drives in the MIT study had been reformatted, and 19 of them still contained recoverable data.
The hard-to-erase quality of hard drives is seen as a good thing by some. Many users like believing that, in a pinch, an expert could recover their deleted files. Law enforcement officers can examine a computer and lift incriminating e-mails or porno images from the hard drive.
The only sure way to erase a hard drive is to "squeeze" it: writing over the old information with new data -- all zeros, for instance -- at least once, but preferably several times. A one-line command will do that for Unix users, and for others, inexpensive software from companies such as AccessData works well.
But few people go to the trouble. Many ordinary computer users toss their old drives into the closet, or take a sledgehammer to it.
As it turned out, most of the hard drives acquired by the MIT students came from businesses that apparently had a misplaced confidence in their ability to "sanitize" old drives.
Tom Aleman, who heads the analytic and forensic technology group at the accounting firm Deloitte & Touche, often encounters companies that get burned by failing to fully sanitize, say, the laptop of an employee who leaves the company for a job with a competitor.
"People will think they have deleted the file, they can't find the file themselves and that the file is gone when, in fact, forensically you may be able to retrieve it," he said.
Garfinkel has learned his lesson. As an undergrad at MIT in the 1980s, he failed to sanitize his own hard drive before returning a computer to his father. His father was able to read his personal journal.
Re:Luckily for me, my Ebay'd hard drives are safe (Score:5, Informative)
PGP! (Score:5, Informative)
That said, experts would tell you that the only reliable way to make sure sensitive data doesn't get out is to thermite your drive.
Also, what's the one-line unix command (running MacOS X here).
Re:HD Abuse (Score:2, Informative)
I think that render the drive useless. =)
Probably not. Most commercial harddrives are rated for at least 50gs of acceleration. My Deskstar is good for up to 100. You might dent the outer case, but it'll probably still work.
Re:just shoot the drive (Score:5, Informative)
Becuase these, at least for the most part weren't personal drives, but drives companies had thrown away.
From the article:
"As it turned out, most of the hard drives acquired by the MIT students came from businesses that apparently had a misplaced confidence in their ability to "sanitize" old drives."
Scary.
CIA (Score:5, Informative)
In regards to Wiping data, do yourself a favor and check out http://www.heidi.ie/eraser/
Beyond the wonderfull wiping the program does, there is the option to make an emergency boot floppy that wipes the HD with DOD style 7-pass or a GutherSomething 36 pass! Niffty for the paranoid.
Re:PGP! (Score:1, Informative)
works well with a few 'dd if=/dev/urandom of=/drive/to/random'
Re:PGP! (Score:2, Informative)
Note: This is a "Linux-centric" answer to the question since
You may also want to fill the hard drive with (semi)random data.
$ dd if=/dev/urandom of=/dev/hda
If you do this for a couple of weeks you should be fine
Re:PGP! (Score:5, Informative)
Ah, the joys of *nix.
Re:You don't need any external software! (Score:4, Informative)
Re:Speaking of data recovery (Score:2, Informative)
Re:Speaking of data recovery (Score:2, Informative)
Re:Oh, man. Hear it comes. (Score:5, Informative)
Re:All Saddam's email are belong to us! (Score:2, Informative)
The bodies of the drives were mostly magnesium, and I came away with about $250 from the scrap metal dealer.
Of course, who knows what I breathed by sanding those platters...
That's fake, bud (Score:2, Informative)
http://www.videopremiereawards.com/HTMLNews/New
a few minutes with tomsrtbt (Score:5, Informative)
Re:PGP! (Score:3, Informative)
Re:start an extortion & blackmail company.. (Score:3, Informative)
*sigh*
From the terms of use page [icanstillt...febill.com] on this site:
"Please note, the content of this interactive movie, including characters and any and all elements, hereof, is entirely fictional, and is not based upon any actual individual or of any other legal entity"
grib.
shred(1) will securely delete files (Score:5, Informative)
http://btr0xw.rz.uni-bayreuth.de/cgi-bin/manpag
See also http://www.cs.auckland.ac.nz/~pgut001/pubs/secure
That Rarely Works Any More (Score:3, Informative)
This is NOT Data Mining! (Score:5, Informative)
GNU shred is your friend (Score:3, Informative)
Enter GNU shred. Its default operation does 25 passes at the drive, with passes such as random data, random patterns and all zeros. Theoretically, the drive has been overwritten so many times that there is almost no chance of recovering data.
Of course, just to play it safe I'll also run it across my stereo speakers a few times too
Re:PGP! (Score:1, Informative)
Re:Not so bad. (Score:4, Informative)
Also that same year, the school councilor retired his trusty quadra 610(?) and he had all the psychological, academic, and disciplinary records on there from 1993 and up on there. No password. No encryption. No attempts to even get rid of data.
A few months back, my brother picked up an old computer for $8 at a garage sale. He wanted me to fix it up for him and get it to do something. I was in for a nasty suprise when I found about 200 MB of gay pr0n jpegs on there.
When I was taking my A+ class at my HS, we were given some old computers from the county office of education to get in working order to give to people who couldn't afford computers. There was a small text file on it that contained passwords for most of the servers in the COE.
You can get quite a bit without even recovering files. People are idiots.
Better options than dd (Score:2, Informative)
For stuff like medical data, financial data, etc., I'd seriously consider looking into wipe [sourceforge.net] instead, which uses Peter Gutman's patterns.
Re:PGP! (Score:3, Informative)
PGP (for windows or mac, ie not GPG) has two commands related to this: wipe file and wipe free space.
And for those wishing for only mid-grade free space wiping, check out "cipher" which comes with Win XP and Win2K SP3. 'cipher /w:c:' will wipe all the free space on c: with 0s, then with 1s, then with random data.
I have mine cron'ned - er, "Task Scheduled" - to run several times a week, just to keep things on the sanitary side. You never know when the layoffs will leave you wondering who is looking at your old hard drive.
Re:DPA (Score:5, Informative)
Nope. A magnetic field that would be strong enough to erase a hard drive would probably also compress it into a lump of twisted metal. from http://www.usenix.org/publications/library/proceed ings/sec96/full_papers/gutmann/ [usenix.org]:
The only way to be really sure is to use an acetylene torch.Re:I sledge them! (Score:3, Informative)
how hard is it to smash platters? (Score:1, Informative)
the platters are fairly rigid so when you smash them they disintegrate into tiny tinty pieces usually never possible to recover (most of the platter ends up in 1/32nd bits or smaller, thats why the paper towel is there, to prevent micro splinters getting wedged in your skin ).
otherwise, just wedge a screwdriver between the casing and platter, and smash platter by leverage.
no one can read data off of dust.
Re:Luckily for me, my Ebay'd hard drives are safe (Score:2, Informative)
Re:Oh, man. Hear it comes. (Score:5, Informative)
Re:shred(1) will securely delete files (Score:5, Informative)
$ man shred
[snip]
CAUTION: Note that shred relies on a very important assumption: that the filesystem overwrites data in place. This is the traditional way to do things, but many modern filesystem designs do not satisfy this assumption. The following are examples of filesystems on which shred is not effective:
* log-structured or journaled filesystems, such as those supplied with AIX and Solaris (and JFS, ReiserFS, XFS, Ext3, etc.)
[snip]
Re:Not so bad. (Score:3, Informative)
Re:Luckily for me, my Ebay'd hard drives are safe (Score:2, Informative)
Or you could use Eraser [heidi.ie].
It's free, as a bonus, and it's floppy-based killer uses Gutmann's algorithim to do it's bit.
-- R
Re:HD Abuse (Score:2, Informative)
For a 25 foot fall with (nearly) no drag, the drive will get up to a speed of 40.0 ft/sec (27.3 MPH). If the drive stops over a 1/8" distance, with -uniform deceleration- (this is pretty generous for a fall onto concrete), this equates to 1600 G's. Halve the distance, and quadruple the force. Decelerate it in a non-uniform fashion (as it realistically would) and you'll get even more spectacular results.
See this review of a hitachi drive [open-mag.com]. Note that they say a drive designed for a non-operating shock of 800G's can take a fall of -one foot- onto concrete. I destroyed a maxtor by dropping it 3 feet onto carpet in a past life, and I'd suspect it was rated for a non-operating shock of at least 50G's.
I'd love to see you try it with your drive with your valuable data sometime though.
Re:Random Bit Overwrite (Score:5, Informative)
Can anyone tell my why there has to be numerous random-bit passes when one could do something like this:
dd if=/dev/zero of=/dev/hda bs=512
What's wrong with just zeroing out the drive once?
Say the child porn file has a one bit and a zero bit. You overwrite it with two zero bits. The magnetic domains where the one bit was are presumably weaker or smaller because they were flipped, not reinforced like the zero bit domains. Of course the drive's read head itself won't be useful for extracting this information, because it's only designed to determine the last bit written by the write head- a binary zero/one determination. But with special equipment you can measure domain strengths carefully, and pull more information than a single bit out of them. You can tell which domains were flipped by the zero-out process and which were reinforced. (Of course this is a simplification because each bit is composed of multiple domains.)
So there are a few trivially obvious considerations when writing an erasing program-
-Don't write zeroes, write ones and zeroes.
-Go in more than one pass. A single pass leaves the bits in 4 possible states- (0,0), (0,1), (1,0), and (1,1) (where (c,r) are the child-porn and random-overwrite bits, respectively). An attacker can in theory tell all four states apart by close physical examination, so he knows c. Two passes (c,r1,r2) leaves 8 possible states- (0,0,0), (0,0,1), (0,1,0), (0,1,1), (1,0,0), (1,0,1), (1,1,0), and (1,1,1). Now the attacker's equipment needs more than twice as much precision, because some of them, like (0,0,1) and (1,0,1), are starting to look physically similar. 10 passes leaves 1024 possible domain states, many of which are indistinguishable.
-Writing zeroes over the file ten times is much better than writing zeroes over it once, but still leaves it in one of only four possible states. (Which are admittedly harder to tell apart, but you never know.)
-Do not allow the content of the file you're erasing to influence your decision of what bits to overwrite it with. You avoid a whole class of problems this way.
-Be aware that when you are writing random numbers, you are actually encrypting, not erasing, the file. The seed you used for your random number generator becomes a key for decrypting the file (given special equipment).
-You want to prevent the attacker from knowing what bits you wrote and in what order you wrote them. You will favor erasure over encryption if you can continually introduce entropy into the process. But entropy is scarce in most software environments. The variations in the timings of the drive's mechanical movements, ping responses from remote servers, mouse movements, and keypresses are well-known sources.
-Don't use a lousy random number generator. There are many ways for a random number generator to be bad. The simplest type produces numbers where n-tuples fall on a regular lattice when plotted in n dimensions. Generators like that are used a lot in scientific and graphics applications, but have no business being in security applications. If an attacker gains access to a few of the numbers in the generator's sequence, he can predict the rest of the sequence. They also loop after generating 2^N numbers.
-If applying this process to a single file, hide the size of the file.
-Ideally you should hide all traces of the file's existence. This means clean up after yourself by writing zeroes in the last several passes, so that even the domain randomness is physically removed (its presence implies that something was erased).
Re:this is also a problem for warranty. (Score:3, Informative)
This is a big problem for DoD-type datacenters; for non-classified (as in "this stuff shouldn't get out") stuff, they open the disk up, sand-blast the platters to remove the magnetic material, then return the carcass to the manufacturer for a warranty claim. For the really secret stuff (as in "people will die if this stuff gets out"), they just destroy the disk completely, then buy a new drive.
Of course, if you kept all the data on the disk encrypted, you'd be fairly safe, but once you're making a warranty claim, the disk probably isn't working well enough for you to wipe using 'dd'...
Speaking of 'dd': Beware of sector remapping. Any sectors on the disk which the firmware has marked 'bad' won't be touched by any user-level command - and those 'bad' sectors could still be recovered if they open the disk up. For most people, 'leaking' a couple of sectors wouldn't be the end of the world, but for (say) VISA's customer records, there are probably a couple of valid CC numbers and other info in those sectors...
Wiping and physics (Score:3, Informative)
If you wipe, remember to take your device's physics into account.
Wipe it once when it is completely "cold" (computer has been turned off for at least several hours), then wipe it again after it has been running for an hour or so, and wipe it a third time after you've giving the disk some serious thrashing (that is, disk activity that moves the head around quite a bit).
The reason is temperature. Data is saved on circles on a magnetic medium. The read/write head has a certain amount of thickness, and so have the tracks on the platter (the tracks have to be a bit widther than the head is, to take thermal expansion into account so the head won't overwrite data on neighbour tracks).
So, for some specialized data recovery company, it may even be possible to recover different data from the same track, because after a while of use, a track can look like this:
---------------- Outer track end
AAAAAAAAAAAAAAAA Older data 1
BBBBBBBBBBBBBBBB
BBBBBBBBBBBBBBBB Actual data
BBBBBBBBBBBBBBBB
CCCCCCCCCCCCCCCC Older data 2
---------------- Inner track end
So, your drive will always read the data in 'B'. In 'C' there might still be data your computer saved when the drive had just spun up and was cold, while 'A' might still hold a copy of data that was written on very heavy disk activity when the drive was really hot.
To overwrite all of this data, you need to have the drive write in any of the temperature states that it has been in within this life.
"Simple" writing might only destroy all 'B' data and leave all 'A' and 'C' data intact on the drive, where they can be recovered.
Secure Harddisk Eraser (boot floppy, GPL) (Score:3, Informative)
Secure Harddisk Eraser implements these 35 or 3 passes on a single floppy. Just boot from the floppy, wait 60 seconds and the harddisk will start to erase.
The homepage [linux-kurser.dk]
Re:DPA (Score:3, Informative)
This is not good enough. Merely Zeroing the data prevents "undeletes" and reading raw sector data in conventional ways, but there are tools to recover data that was been zeroed.
A simplistic way of think about it is this (this isn't remotely close to what really happens, but it's sufficient to get the point across): Each bit on the drive can have a real value of 1-100. 1-50 is interpreted as zero, 51-100 is a one. However, changing a bit from one to zero doesn't usually apply enough magnetic force to move it a full 100 points. Therefore it's common that if you zero a bit that used to be a zero, it will end up being very very low, but if you zero a bit that used to be a one, it will be in the higher one range, say a 40. Based on this, data recovery experts can get a pretty good picture of what the data used to be.
The US DoD has a standard they established way back when for fully erasing data against these sorts of recovery techniques. I don't know how old it was, but it was well-known in the early 90's for sure. It may not be safe any more. It specified overwriting the data a total of 7 times with specific patterns (something like 00, FF, 77, 11, EE, 77, 00, FF
The moral of the story is, don't trust any software method for destroying data. Use a blowtorch or an electric sander on the raw platter surfaces after removing them from the drive casing. While you're at it hit the electronics and the heads too. Or throw the whole thin in an incinerator that's hot enough to melt case platters and all into a lump of metal.
Re:Better yet! (Score:2, Informative)
Its a mini-linux distribution that boots off a floppy, then allows you to pick which hard drive you want to wipe clean.
Re:DPA (Score:1, Informative)
The storage most vulnerable to magnetic fields are cheap tape and floppy disks.