Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Slashdot Log In

Log In

Create Account  |  Retrieve Password

Intel Patents On-Chip Cosmic Ray Detectors

Posted by CmdrTaco on Sat Mar 08, 2008 10:17 AM
from the is-it-april-already dept.
holy_calamity writes "Intel has been awarded a patent for building cosmic ray detectors into chips, to guard against soft errors where a high energy particle from space changes a value in a circuit. It's a problem that largely only affects RAM. As component sizes shrink futher, "this problem is projected to become a major limiter of computer reliability in the next decade", says the patent. Intel's solution is to build in a detector that responds to cosmic errors by repeating the latest operation, reloading previous instructions, or rolling back to a previous state. You can also read the full patent."
+ -
story

Related Stories

This discussion has been archived. No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More
Loading... please wait.
  • But you can't really verify it because those events are so rare. It seems to me that Intel's innovation is to use some sort of detector, instead of using two or more chips and a comparator. It's probably way cheaper, but it won't work if the majority of unexplainable events are not, in fact, caused by cosmic rays but by some other effect (perhaps something temperature-related).
    • I saw a display in the visitors' center at CERN that detected cosmic rays. A cloud chamber, maybe.

      Either way, the... 2m by 2m (IIRC) display would detect cosmic rays about once every 2 seconds. This would mean my PC case is perforated by cosmic rays several times each minute. That's not rare.
    • Re: (Score:3, Informative)

      Actually you can prove cosmic rays cause memory errors. IBM did so in the 90's; there was mention of this (and a link) in the article. As memory cells become smaller they WILL become sensitive to ionizing radiation. Intel seems to think we will get there sometime in the next decade or so.
      • that would be why servers etc all use ECC type RAM, with that extra parity bit, it's easy to detect the corruption of memory, and re-do whatever needs to be redone. the difference is that now intel is putting in some sort of cosmic ray detector, rather than detecting what happens to the ram...

        it seems painfully inefficient to 'redo' stuff that doesn't seem to be wrong just because a cosmic ray was detected. it's not like cosmic rays can be easily blocked, either, you could put the computers under a mountain
        • by SeekerDarksteel (896422) on Saturday March 08 2008, @12:11PM (#22687400)
          it seems painfully inefficient to 'redo' stuff that doesn't seem to be wrong just because a cosmic ray was detected.

          1) The likelihood of a cosmic ray is ridiculously small. So small in fact that the cost of rewinding progress when they are detected would be completely unnoticeable.

          2) We *do* have the ability to package CPUs such that they are protected by CPUs. The problem is that the packages are so large and expensive that no one would buy them given the current probability of soft errors.

          So the solution is most definitely NOT to stop shrinking transistors. Even in 10 process technology generations, the mean time to a soft error actually affecting a bit on a CPU is something like 1 million hours. Never mind whether or not that particular soft error is critical.
          • by kesuki (321456) on Saturday March 08 2008, @12:36PM (#22687522) Journal
            well, if the detector is the size of a penny, then yes probably pretty rare to detect cosmic rays... but if the detector is the size of a pc case, it will get hits every few seconds. cosmic rays ARE very common, and not all of them are magnetically deflected, or stopped by the atmosphere. they just happen to be very small, and the frequency of hits to a small target is less than to a large target. about 8% of the radiation humans are exposed to each year are from cosmic rays. http://en.wikipedia.org/wiki/Cosmic_ray [wikipedia.org]

            so clearly to a human sized target, the impact ratio is significant.
          • by petes_PoV (912422) on Saturday March 08 2008, @01:01PM (#22687624)
            Just to quantify the effect, the Sun E10000 Starfires we used a few years ago had ECC error counters built into the operating system. When I asked what they were for the salesman told me straight that they detected/corrected cosmic ray hits.

            More for laughs than anything else, I started logging them and found that a server with 16GB got maybe one ot two hits per week. After that I started to take ECC seriously - for professional quality servers.

            You probably don't need it for the domestic appliance quality stuff that people run at home - but for real work, get some decent kit

          • http://uksbsguy.com/blogs/doverton/archive/2007/05/23/microsoft-says-pcs-may-need-dram-upgrade-to-ecc-ram.aspx [uksbsguy.com]

            Microsoft's XP crash analysis early in this decade concluded that PCs always left on tended to crash unexpectedly. Dump analysis showed strange values in key OS variables, and cosmic rays (or other bit-blasting particles) were among the likely sources. The conclusion was so clear that Microsoft floated the idea (see URL above) that Vista-generation PCs should use Error-Correcting Code (ECC) mem

          • Maybe the solution will come when we abandon charge-states as our means of information processing, and instead shift into photonics. These components will then be immune to ionizing radiation.
          • I work for Sun Microsystems, and it's not that rare. Perhaps it's because I see the errors when they happen on all the systems we have across North America as people report them, but I get at least a few hits a day regarding single bit ecache errors. I would say they happen probably once, in a five year period, for each processor. Well, maybe less. Actually, the more I think about the amount of systems we have out there, and that there's a few a day... Hell, I guess it is pretty rare. But it happens!
      • As memory cells become smaller they WILL become sensitive to ionizing radiation.

        I have heard this claim before and I have yet to see any kind of credible argument for this. The ionization energy loss of a charged particle penetrating matter is proportional to the distance travelled -- so a smaller memory cell may need less energy to "flip" it, but it will also receive less energy by a passing particle. Thus if the aspect ratio (thickness to length) doesn't change, I see no particular reason why smaller transistors should be "more sensitive" to cosmics. To the contrary: a smaller are

  • How? (Score:5, Interesting)

    by mistersooreams (811324) on Saturday March 08 2008, @10:38AM (#22686944) Homepage
    How did they manage to build a detector that can work out whether the cosmic rays collided with the actual bits (no pun intended) that hold the data? According to the oracle [wikipedia.org], cosmic rays collide with nuclei in an essential random way, so there's no way a detector could just see a ray passing through and know whether it was on a collision course. Perhaps they are detecting the pions and other subatomic particles that result from a collision actually occurring? If they've found a way to do that then it sounds fairly ingenious to me and a well-deserved patent.
    • Re:How? (Score:4, Informative)

      by hedwards (940851) on Saturday March 08 2008, @11:24AM (#22687162)
      They didn't, they've created a detector which works out whether the chip was hit by a cosmic ray or not. Then the ram is somehow restored to the state previous to the last operation and that operation is then repeated. I'm not even sure that hit is the right word, they've developed a detector that is capable of knowing when a cosmic ray travels through the same space as the chip, I don't know that they care whether or not the ray actually hit something or just traveled through the open space between the atoms.

      It's a lot less likely to cause problems than trying to guess which bit it was, and far less expensive than building a RAIMM(TM) to compensate for it.
      • [quote]They didn't, they've created a detector which works out whether the chip was hit by a cosmic ray or not.[/quote]

        As the GP said, there is no way of knowing wheter a cosmic ray passed through you or not. The cosmic ray could easily just smash your bit to a new, random state and pass happily unhindered through the actual detector thingy. Only way to improve the situation would be to build a large detector volume (at least a couple cm^3).
      • I think you missed the point the parent was trying to make. There's a catch-22 going on here: you can only detect a cosmic ray by interacting with it, but if you can interact with it then it's not a problem, because once it interacts with something then it's gone. All in all, this sounds suspiciously like a patent on parity ram disguised as something else.
        • Re:How? (Score:5, Informative)

          by Waffle Iron (339739) on Saturday March 08 2008, @01:29PM (#22687744)

          but if you can interact with it then it's not a problem, because once it interacts with something then it's gone.

          With cosmic rays, it's not just "gone". Instead, you get a shower of new energetic particles generated by the collision which compounds the risk of operational errors. The patent specifically mentions alpha particles knocked out of the atoms in the chip by the ray which travel through the circuits causing havoc.

          The patent also mentions that the detector may sense side effects of collision (such as voltage spikes) rather than the ray particle itself. Thus, the damage has already been done by the time the detector sees the event.

      • Cosmic ray impacts can inject carriers into the substrate which is a detectable condition. In older processes and especially in CMOS processes that do not use well isolation based on an insulator, carrier injection into the substrate can cause problems from random bias changes to destructive SCR latchup. You can see this in some analog processes where multiple circuits share the same substrate when you overdrive an input or output pin forward biasing the protection diode injecting carriers into the subst
    • Re: (Score:3, Informative)

      Next time, please read before posting. Oh wait, I must be new here.

      In some embodiments, the cosmic ray detector detects the debris tract of a cosmic ray. In some embodiments, the cosmic ray detector includes large, distributed P-N junctions to gather charge. In some embodiments, the cosmic ray detector includes optical cosmic ray detectors embedded into some optically clear supporting insulator such as diamond thermal spreaders. For example, one million electron-hole pairs may create a large number of reco

    • You're forgetting about what happens after an interaction of a cosmic ray with an atom. In the case of the ray being a neutron, the interaction will result in a lot of kinetic energy imparted to the nucleus (called the primary knock-on atom) which will then tear off a bunch of electrons as it slows down (a heavy charged particle with a given energy will have a well defined range in matter, which is why ion implantation superceded diffusion in chip fabrication). The range of the nucleus will likely be much l
  • by beefsprocket (1152865) on Saturday March 08 2008, @10:40AM (#22686952)
    Cosmic ray detector certainly makes for better marketing hype than ECC.
    • Cosmic ray detector certainly makes for better marketing hype than ECC.

      Yeah, its utterly ridiculous to believe that strange radiation from outer space can mes#[!^ ~` . '
           
  • by BenJeremy (181303) on Saturday March 08 2008, @10:41AM (#22686966)
    It's just as likely registers could be corrupted, or the "rollback" state. Wouldn't be easier to have, I dunno, maybe error correction/detection involved, instead of some arbitrary cosmic ray detector?

    Sometimes the more "esoteric" designers attempt to get simply leads to more potential for disaster.

    Cosmic ray detection would be far better for random number generation, than anything else.
  • by elrous0 (869638) * on Saturday March 08 2008, @10:44AM (#22686974)
    I know at least four people who REALLY could have used this. Oh well, too late now.
  • It seems to me that, even if the individual detectors are very simplistic, and the geocoding of inputs is very rough, there would probably be some interesting scientific uses for a multi-million node planet-sized distributed cosmic ray detector.

    Does anyone in an relevant field see a good use for this?
    • I work on distributed cosmic ray detectors. The patent is very sparse with details, so it's difficult to say much about it. The biggest problems I see are timing and data analysis. The detectors need to have a synced clock to within a few nanoseconds. This is possible with GPS if you know all the circuitry and the delays therein. But I don't think you could do it in normal pc's. Now each pc needs at least two detectors to do some triggering before you send the data. If you don't you'll end up with huge amou
  • by canada_dry (830702) on Saturday March 08 2008, @11:06AM (#22687074) Journal
    It won't take long for someone to figure out how to detect the gamma errors and create what amounts to a geiger counters on laptop computers. If this bill passes http://www.villagevoice.com/news/0803,thompson,78873,2.html [villagevoice.com] will everyone be required to get a permit for their laptop computers? ;-)
  • by quo_vadis (889902) on Saturday March 08 2008, @11:11AM (#22687104) Journal
    Currently, chips (both computational and memory) are protected against soft errors using multiple methods. There are rad hardening methods (both hardware and software) and most of the latest research involves using error correcting codes. Simply duplicating the output and comparing can only detect errors in one bit. The more the times you duplicate, the more you can detect (it progresses as n-1), and the max length of error that can be corrected is half that. However, this takes a lot of space (duplication that is), so generally other codes such as Hamming or BCH codes are used.

    The main problem using codes and everything is that cosmic ray errors cause whats called single event upsets and most codes can not detect 100% of errors where the hamming weight of the error (sum of number of ones in the error vector) is larger than the designed specification of the error. The problem comes when the SEU manifests itself as a multi-bit fault and the error vector cannot be detected by the code. SEU's are the most common type of errors in space application : See http://www.eas.asu.edu/~holbert/eee460/see.html [asu.edu]

    The contribution of the cosmic error detector is that if you know you have a cosmic ray at some point in time, you can flush and redo your computation (for computation channels eg microprocessors etc) or flush that line in memory (for memory channels) in case of SEU's and that is a pretty big deal.
    • by museumpeace (735109) on Saturday March 08 2008, @12:05PM (#22687364) Journal
      you mention rad hardening...some of that tech. would have been first needed in military satellites and so not necessarily divulged in a patent. One kind of rad hardened circuit that used to be prohibitive but with advances in solid state fab requires a particular kind of redundancy. It has been described in prior literature kinda like this: build a functional duplicate of each storage or processing element in a parallel layer so that ...
      • each element is aligned right over its corresponding element in the 2nd layer.
      • bias the logic of one layer such that the burst of conduction band electrons that would accompany a gamma ray hit will report a false "1" if anything.
      • bias the corresponding logic in the other layer so that that same burst of electrons...which will befall it at exactly the same time an place as its aligned circuit...will fault to a "0",if anything
      • gate the primary layer's output by the !XOR of the two layers: whatever the state of the circuit was supposed to be, it will be disabled until the transient from the gamma ray has been quenched
  • by br1an.warner (1089965) on Saturday March 08 2008, @11:17AM (#22687130)
    POWER6 has actually be shipping with this for a while - if an instruction fails (cosmic ray or not, although in terms of random bit-flipping events they account for a large percentage), it gets automatically retried, transparently to the rest of the system. Without this sort of thing you generally take a hard fault - so this type of protection is great to see. Same thing on a SPARC64, incidentally (but not UltraSPARC - ie Niagara or children). What sets the POWER6 apart from both SPARC64 and this patent is if that instruction fails repeatedly Possibly indicating a chip fault), in many cases it can actually back the instruction out of the failing core and slap it onto another core, also transparently and avoiding a hard crash. Someone noted that this has been done on mainframes for years - yup, also true. This is another case of UNIX-class technology making inroads up the platform stack.
  • In the late 70's TC May, an scientist working at Intel proved that cosmic rays could flip bits... given that discovery was many years ago, it seems rather clear that as chips get smaller, etc. that cosmic ray dectection could be a good thing on chips. http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1479948 [ieee.org]
  • by sbaker (47485) * on Saturday March 08 2008, @11:41AM (#22687258) Homepage
    For RAM - there is really no problem - just use error checking. It's got to be easier to add an extra couple of bits to the width of your RAM to permit error-correction than to have a cosmic ray detector for every single bit.

    The tricky problem isn't RAM - it's computational elements. There is no single way to error-correct computational elements because they are so diverse. A multiplier would need different protection to an adder which is different from a shift-register. Hence, the idea of rolling back (say) the last instruction executed and having a "do-over".

    But for large arrays of homogeneous circuitry - like RAM - this doesn't seem worth the effort.
    • Re: (Score:3, Informative)

      You are exactly right that the real problem is not the functionality of the memory chips, but rather the processor chips. For a number of reasons (but having said that it is very likely that a significant portion of the the problem of soft CPU chips is the on chip level one cache)

      On a regular basis I participate in the "radiation testing" of laptops intended for use on both the Space Shuttle and the International Space Station. This testing is normally done at Indiana University's Cyclotron Facility in B
  • I feel another distributed computing project coming up: after SETI@home, and Folding@home, maybe this would make for an interesting way to get statistics on cosmic rays?
  • Why don't they... (Score:4, Insightful)

    by sokoban (142301) on Saturday March 08 2008, @11:52AM (#22687296) Homepage
    ... Just mount the chips in a vertical fashion. I work in an X-ray crystallography lab and we have a large format CCD detector. It's maybe about half a foot in diameter, but because it is mounted vertically, I see a cosmic ray streak maybe once every 200 or so 40 second exposures. Compare that to a cosmic ray detector of roughly the same size which is mounted horizontally in the other side of the building. It's counting cosmic rays almost constantly.
  • This sounds similar to what DARPA's EMPiRe [darpa.mil] project is doing.
  • This subject reminds me of a paper I saw some time ago, on a way to use the cosmic rays to your advantage and breaking out of the JVM. Here's the link: http://www.cs.princeton.edu/sip/pub/memerr.pdf [princeton.edu]
  • Defensive patent. (Score:3, Interesting)

    by Bill, Shooter of Bul (629286) on Saturday March 08 2008, @12:15PM (#22687410) Journal
    Its widely acknowledged that Intel created EMF burst proof chips for the government. The technology inside of them was never publicly discussed. I think it might be similar to cosmic ray correction. They might just be patenting a sub set of it now before the shrinking die sizes cause someone else to patent technology they've been using for years.
  • So they can tell now when a cosmic ray hits chip, and correct for it. But what happens when a cosmic ray hits the cosmic-ray detector and scrambles its brains, huh? Will we need a corrector for the corrector now too? And a corrector-corrector corrector? WHERE WILL IT ALL END
  • by Xest (935314) on Saturday March 08 2008, @01:07PM (#22687656)
    Tin foil hats, for RAM!
    • Tin foil hats, for RAM!

      Oddly enough, that will not work well for direct impacts although it might be worthwhile at sea level. If you add shielding around the chip and it is directly exposed to a cosmic ray event, the shielding just serves to create a shower of particles which then affect a much larger area and transfer much more energy.
  • In order to get a good idea of whether a few bits have changed in a large RAM array due to radiation (which is all it takes... more than a couple of bits can bollix data even in ECC memory), the detector itself would have to be comparable in size to the memory array.

    It is a waste of space.

    It would be cheaper (and maybe even lighter) to just radiation-harden the chip.
  • This doesn't sounds so extremely new to me. You can even download the vhdl to a rad hard Leon3 [gaisler.com] (SPARC V8 instruction set) at gaisler here [gaisler.com]. This chip covers SEU (Single Event Upsets) typical of those caused by cosmic rays.
  • Microsoft claims Vista's poor performance and unreliability are due to interference from cosmic rays. Vista makes a computer run so fast, they claim, that cosmic rays present a serious threat to the computer's stability, often resulting in lower performance than older operating systems like XP. Microsoft plans to release a cosmic ray shielding computer case, which will retail for $300, and should be released some time this month. Current Vista license holders will get a $50 discount.
  • Patents are easier to read online with Google Patents. It also lets you download a PDF.

    here [google.com]
    • That's what you are patenting: the idea! Although you are supposed to be working on something commercially viable.

      The patent office does not insist on working models unless it is an extremely unlikely idea... like perpetual motion, or free energy. There are good reasons for that.