Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Patents Intel

Intel Patents On-Chip Cosmic Ray Detectors 100

holy_calamity writes "Intel has been awarded a patent for building cosmic ray detectors into chips, to guard against soft errors where a high energy particle from space changes a value in a circuit. It's a problem that largely only affects RAM. As component sizes shrink futher, "this problem is projected to become a major limiter of computer reliability in the next decade", says the patent. Intel's solution is to build in a detector that responds to cosmic errors by repeating the latest operation, reloading previous instructions, or rolling back to a previous state. You can also read the full patent."
This discussion has been archived. No new comments can be posted.

Intel Patents On-Chip Cosmic Ray Detectors

Comments Filter:
  • by confused one ( 671304 ) on Saturday March 08, 2008 @12:10PM (#22687098)
    Actually you can prove cosmic rays cause memory errors. IBM did so in the 90's; there was mention of this (and a link) in the article. As memory cells become smaller they WILL become sensitive to ionizing radiation. Intel seems to think we will get there sometime in the next decade or so.
  • by br1an.warner ( 1089965 ) on Saturday March 08, 2008 @12:17PM (#22687130)
    POWER6 has actually be shipping with this for a while - if an instruction fails (cosmic ray or not, although in terms of random bit-flipping events they account for a large percentage), it gets automatically retried, transparently to the rest of the system. Without this sort of thing you generally take a hard fault - so this type of protection is great to see. Same thing on a SPARC64, incidentally (but not UltraSPARC - ie Niagara or children). What sets the POWER6 apart from both SPARC64 and this patent is if that instruction fails repeatedly Possibly indicating a chip fault), in many cases it can actually back the instruction out of the failing core and slap it onto another core, also transparently and avoiding a hard crash. Someone noted that this has been done on mainframes for years - yup, also true. This is another case of UNIX-class technology making inroads up the platform stack.
  • Re:How? (Score:4, Informative)

    by hedwards ( 940851 ) on Saturday March 08, 2008 @12:24PM (#22687162)
    They didn't, they've created a detector which works out whether the chip was hit by a cosmic ray or not. Then the ram is somehow restored to the state previous to the last operation and that operation is then repeated. I'm not even sure that hit is the right word, they've developed a detector that is capable of knowing when a cosmic ray travels through the same space as the chip, I don't know that they care whether or not the ray actually hit something or just traveled through the open space between the atoms.

    It's a lot less likely to cause problems than trying to guess which bit it was, and far less expensive than building a RAIMM(TM) to compensate for it.
  • by sbaker ( 47485 ) * on Saturday March 08, 2008 @12:41PM (#22687258) Homepage
    For RAM - there is really no problem - just use error checking. It's got to be easier to add an extra couple of bits to the width of your RAM to permit error-correction than to have a cosmic ray detector for every single bit.

    The tricky problem isn't RAM - it's computational elements. There is no single way to error-correct computational elements because they are so diverse. A multiplier would need different protection to an adder which is different from a shift-register. Hence, the idea of rolling back (say) the last instruction executed and having a "do-over".

    But for large arrays of homogeneous circuitry - like RAM - this doesn't seem worth the effort.
  • by saltydog56 ( 1135213 ) on Saturday March 08, 2008 @01:32PM (#22687488)
    You are exactly right that the real problem is not the functionality of the memory chips, but rather the processor chips. For a number of reasons (but having said that it is very likely that a significant portion of the the problem of soft CPU chips is the on chip level one cache)

    On a regular basis I participate in the "radiation testing" of laptops intended for use on both the Space Shuttle and the International Space Station. This testing is normally done at Indiana University's Cyclotron Facility in Bloomington, Indiana. This past fall we completed testing on a group of laptops which implemented Intel's dual core Centrino Pro processors. Testing is conducted by hitting each of the components in the laptop with a proton beam while monitoring for induced errors.

    While the results of the testing varied by memory manufacturer, by far the softest component in the laptop was the CPU itself. That said, these processors actually did fairly well compared to some of the previous generations of CPU chips we have tested over the years.

    The rule that the smaller the die size, the greater the error rate does not seem to apply. For example, a number of years ago we tested a number of laptops using the Intel Pentium 3 mobile chip. Performance was so dismal that the decision was made not to procure any system based on that chip.

    Later testing of laptops based on the Pentium 4 mobile chip showed a dramatic turnaround - the Pentium 4 mobile chip, with its smaller die size actually out performed both the pentium3 mobile and the Pentium 2 chips then used for on-orbit operations. Our group does not do any analysis of "why" a failure occurred, only the collection of data to assist in the selection of suitable devices for use on the Shuttle and the ISS.

    The bottom line - die size is only one of the factors which come into play in determining how a chip will perform when hit by ionizing radiation. (one of my favorite theories is the declining deltas between a 1 and a 0 - in days gone by it could have been as much a five volts but is commonly down to around 1 volt in todays modern processors - this could serve to bring any electrical disruption caused by a particle strike closer to the threshold of changing a one to a zero - but what do I know, I am just a software guy)

    The concept of building a detector into chips is interesting, but not enough detail is provided to make a judgment on its feasibility. Single Event Upsets (SEUs or Bit-flips) are caused when a sub atomic particle such as a proton or a heavy ion slams into the silicon causing either an electrical disruption or damage to the silicon itself.

    The key here is that these particles are so tiny compared to the circuit itself that, from my perspective, unless the "detector" somehow encapsulates the whole circuit it is unlikely even notice the passage of a proton or other particle. To make detection even more difficult you must remember that you are working in a three dimensional environment - you can not predict the direction of travel, its energy level, or the location of a "strike"

    However, dealing with the effects of radiation on electronic components is something we are going to have to learn to deal with someday, so this research is both exciting and worthwhile.
  • by kesuki ( 321456 ) on Saturday March 08, 2008 @01:36PM (#22687522) Journal
    well, if the detector is the size of a penny, then yes probably pretty rare to detect cosmic rays... but if the detector is the size of a pc case, it will get hits every few seconds. cosmic rays ARE very common, and not all of them are magnetically deflected, or stopped by the atmosphere. they just happen to be very small, and the frequency of hits to a small target is less than to a large target. about 8% of the radiation humans are exposed to each year are from cosmic rays. http://en.wikipedia.org/wiki/Cosmic_ray [wikipedia.org]

    so clearly to a human sized target, the impact ratio is significant.
  • by petes_PoV ( 912422 ) on Saturday March 08, 2008 @02:01PM (#22687624)
    Just to quantify the effect, the Sun E10000 Starfires we used a few years ago had ECC error counters built into the operating system. When I asked what they were for the salesman told me straight that they detected/corrected cosmic ray hits.

    More for laughs than anything else, I started logging them and found that a server with 16GB got maybe one ot two hits per week. After that I started to take ECC seriously - for professional quality servers.

    You probably don't need it for the domestic appliance quality stuff that people run at home - but for real work, get some decent kit

  • Re:How? (Score:3, Informative)

    by deblau ( 68023 ) <slashdot.25.flickboy@spamgourmet.com> on Saturday March 08, 2008 @02:13PM (#22687672) Journal
    Next time, please read before posting. Oh wait, I must be new here.

    In some embodiments, the cosmic ray detector detects the debris tract of a cosmic ray. In some embodiments, the cosmic ray detector includes large, distributed P-N junctions to gather charge. In some embodiments, the cosmic ray detector includes optical cosmic ray detectors embedded into some optically clear supporting insulator such as diamond thermal spreaders. For example, one million electron-hole pairs may create a large number of recombination photons. In some embodiments, a scintillator panel (which gives off small flashes of light (photons) when a charge particle passes through it), a light guide to direct light from the scintillator, and photon detectors may be used.

    In some embodiments, the cosmic ray detectors include an array of micro-electro-mechanical systems (MEMS). MEMS cosmic ray detectors may be an integration of mechanical elements, sensors, actuators, and electronics on a very small scale. The cosmic ray detectors may include tips or other strain detectors to detect the shockwave from the nuclear collision by means of acoustic waves propagating through the substrate.

  • Re:How? (Score:5, Informative)

    by Waffle Iron ( 339739 ) on Saturday March 08, 2008 @02:29PM (#22687744)

    but if you can interact with it then it's not a problem, because once it interacts with something then it's gone.

    With cosmic rays, it's not just "gone". Instead, you get a shower of new energetic particles generated by the collision which compounds the risk of operational errors. The patent specifically mentions alpha particles knocked out of the atoms in the chip by the ray which travel through the circuits causing havoc.

    The patent also mentions that the detector may sense side effects of collision (such as voltage spikes) rather than the ray particle itself. Thus, the damage has already been done by the time the detector sees the event.

An authority is a person who can tell you more about something than you really care to know.

Working...