Intel Patents On-Chip Cosmic Ray Detectors 100
holy_calamity writes "Intel has been awarded a patent for building cosmic ray detectors into chips, to guard against soft errors where a high energy particle from space changes a value in a circuit. It's a problem that largely only affects RAM. As component sizes shrink futher, "this problem is projected to become a major limiter of computer reliability in the next decade", says the patent. Intel's solution is to build in a detector that responds to cosmic errors by repeating the latest operation, reloading previous instructions, or rolling back to a previous state. You can also read the full patent."
Re:Mainframes allegedly already do this (Score:3, Informative)
Processor instruction retry (Score:3, Informative)
Re:How? (Score:4, Informative)
It's a lot less likely to cause problems than trying to guess which bit it was, and far less expensive than building a RAIMM(TM) to compensate for it.
If the problem is with RAM... (Score:4, Informative)
The tricky problem isn't RAM - it's computational elements. There is no single way to error-correct computational elements because they are so diverse. A multiplier would need different protection to an adder which is different from a shift-register. Hence, the idea of rolling back (say) the last instruction executed and having a "do-over".
But for large arrays of homogeneous circuitry - like RAM - this doesn't seem worth the effort.
Re:If the problem is with RAM... (Score:3, Informative)
On a regular basis I participate in the "radiation testing" of laptops intended for use on both the Space Shuttle and the International Space Station. This testing is normally done at Indiana University's Cyclotron Facility in Bloomington, Indiana. This past fall we completed testing on a group of laptops which implemented Intel's dual core Centrino Pro processors. Testing is conducted by hitting each of the components in the laptop with a proton beam while monitoring for induced errors.
While the results of the testing varied by memory manufacturer, by far the softest component in the laptop was the CPU itself. That said, these processors actually did fairly well compared to some of the previous generations of CPU chips we have tested over the years.
The rule that the smaller the die size, the greater the error rate does not seem to apply. For example, a number of years ago we tested a number of laptops using the Intel Pentium 3 mobile chip. Performance was so dismal that the decision was made not to procure any system based on that chip.
Later testing of laptops based on the Pentium 4 mobile chip showed a dramatic turnaround - the Pentium 4 mobile chip, with its smaller die size actually out performed both the pentium3 mobile and the Pentium 2 chips then used for on-orbit operations. Our group does not do any analysis of "why" a failure occurred, only the collection of data to assist in the selection of suitable devices for use on the Shuttle and the ISS.
The bottom line - die size is only one of the factors which come into play in determining how a chip will perform when hit by ionizing radiation. (one of my favorite theories is the declining deltas between a 1 and a 0 - in days gone by it could have been as much a five volts but is commonly down to around 1 volt in todays modern processors - this could serve to bring any electrical disruption caused by a particle strike closer to the threshold of changing a one to a zero - but what do I know, I am just a software guy)
The concept of building a detector into chips is interesting, but not enough detail is provided to make a judgment on its feasibility. Single Event Upsets (SEUs or Bit-flips) are caused when a sub atomic particle such as a proton or a heavy ion slams into the silicon causing either an electrical disruption or damage to the silicon itself.
The key here is that these particles are so tiny compared to the circuit itself that, from my perspective, unless the "detector" somehow encapsulates the whole circuit it is unlikely even notice the passage of a proton or other particle. To make detection even more difficult you must remember that you are working in a three dimensional environment - you can not predict the direction of travel, its energy level, or the location of a "strike"
However, dealing with the effects of radiation on electronic components is something we are going to have to learn to deal with someday, so this research is both exciting and worthwhile.
Re:Mainframes allegedly already do this (Score:4, Informative)
so clearly to a human sized target, the impact ratio is significant.
we used to detect 1 or 2 hits a week (Score:4, Informative)
More for laughs than anything else, I started logging them and found that a server with 16GB got maybe one ot two hits per week. After that I started to take ECC seriously - for professional quality servers.
You probably don't need it for the domestic appliance quality stuff that people run at home - but for real work, get some decent kit
Re:How? (Score:3, Informative)
Re:How? (Score:5, Informative)
With cosmic rays, it's not just "gone". Instead, you get a shower of new energetic particles generated by the collision which compounds the risk of operational errors. The patent specifically mentions alpha particles knocked out of the atoms in the chip by the ray which travel through the circuits causing havoc.
The patent also mentions that the detector may sense side effects of collision (such as voltage spikes) rather than the ray particle itself. Thus, the damage has already been done by the time the detector sees the event.