Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
The Courts Software

Delta Sues CrowdStrike Over Software Update That Prompted Mass Flight Disruptions (reuters.com) 78

An anonymous reader quotes a report from Reuters: Delta Air Lines on Friday sued cybersecurity firm CrowdStrike in a Georgia state court after a global outage in July caused mass flight cancellations, disrupted travel plans of 1.3 million customers and cost the carrier more than $500 million. Delta's lawsuit filed in Fulton County Superior Court called the faulty software update from CrowdStrike "catastrophic" and said the firm "forced untested and faulty updates to its customers, causing more than 8.5 million Microsoft Windows-based computers around the world to crash." [...]

Delta, which has purchased CrowdStrike products since 2022, said the outage forced it to cancel 7,000 flights, impacting 1.3 million passengers over five days. "If CrowdStrike had tested the faulty update on even one computer before deployment, the computer would have crashed," Delta's lawsuit says. "Because the faulty update could not be removed remotely, CrowdStrike crippled Delta's business and created immense delays for Delta customers." Delta said that as part of its IT-planning and infrastructure, it has invested billions of dollars "in licensing and building some of the best technology solutions in the airline industry."

This discussion has been archived. No new comments can be posted.

Delta Sues CrowdStrike Over Software Update That Prompted Mass Flight Disruptions

Comments Filter:
  • Damages never more than the software costs.

    • These exclusions donâ(TM)t apply to things like gross negligence or fraud, and whether or not the event was one of those is a question of fact for a jury to decide.

      • by DingerX ( 847589 ) on Saturday October 26, 2024 @08:49AM (#64895677) Journal
        The sticky parts here:
        Delta is demanding from its contractor reparations for damages that it's refusing its customers. When they reject claims for which they have responsibility, how can they go after their contractors on the same basis?
        Delta is not the only airline to use CrowdStrike, and they all had outages on July 19. Delta, however, is the only one that couldn't recover until July 24.

        Crowdstrike's test and deployment processes to me look like gross negligence: their business is having companies entrust them with access to the Kernal and deploying timely and safe updates. Everyone else uses a robust testing process including staggered deployment.
        But Delta, in going about a lawsuit, will be required to reveal their own IT processes and shortcomings that led to a five-day collapse.

        Or they settle for zero dollars.
        • by djp2204 ( 713741 ) on Saturday October 26, 2024 @10:59AM (#64895833)

          The process that delta uses may not be relevant. What matters is what Crowdstrike said its processes are, what the contract between delta and Crowdstrike says, and what a reasonable person would do based on those assurances. Crowdstrike publicly admitted that it did not test the update. If the contract between delta and Crowdstrike says that Crowdstrike shall test all updates before deployment on infrastructure that duplicates delta infrastructure because delta doesnt want to do that testing itself, Crowdstrike is cooked.

          • I didn't follow this one beyond the basics. Did crowdstrike really admit in public or to Delta lawyers that they didn't test before releasing??

            • I'm not sure if they publicly admitted it like the OP stated, but it would be hard to argue to anyone with even half a brain that they did test. When your update fails in virtually 100% of the cases, you're going to have to show some extraordinary proof that you tested it and the test(s) passed.
              • The software was not tested as released.

                They have extensive testing, and tested both the software and data file before it was released.

                It failed however because they didn't test these as part of the distributed binary package.

                The tests running against the software were basically not using the latest data file. Same for the data file, which was tested against software different from the release, and older version.

                In testing, because the way they tested - they never actually activated the code path that caus

                • > I've been in meetings where the decision to do functional or full end to end testing is deemed not worth while because there are all these wonderful tests for the individual components.

                  We were in the same meeting.

                  The process went like this:
                  1) "we don't need a QA team, the devs test their own work" followed by serious outage and data loss
                  2) "we don't need end to end testing, QA has 762 unit tests" followed by serious outage and data loss
                  3) "we are going to skip end to end testing on this release, $impor

                  • by smillie ( 30605 )

                    Important systems should have redundant backup/restore options.

                    Where I worked (now retired) as sysadmin, every system had two boot disks. If a system was patched and failed on reboot for any reason, the system was booted off the unpatched second disk. Now if the second disk failed because lightning struck the system we would boot to the unpatched hot spare system which also had duel boot disks. We also had multiple levels of remote backup to restore every system.

                    I don't know at what level Delta made their d

              • but they did test it in the only circumstamce they warrant it will function in the TOS.
          • by mspohr ( 589790 )

            Does the contract really have to say that Corowdstrite must test updates?
            Isn't that just assumed?
            I mean, isn't it just basic software development practice to actually test software before you release it?

        • by Frobnicator ( 565869 ) on Saturday October 26, 2024 @11:24AM (#64895853) Journal

          Delta is not the only airline to use CrowdStrike, and they all had outages on July 19. Delta, however, is the only one that couldn't recover until July 24.

          That's one of the strongest parts of the defense against the bulk of the lawsuit.

          The hours immediately after the issue? Absolutely the company is on the hook there, and their insurance is already paying out a fortune for those claims. No question about the liability there.

          But after that? While the core systems were down for hours airlines had to move things around. All the other airlines were out for about an additional day to reposition their flight crews and aircraft. If you want to look at what is reasonable, look to the rest of the industry. The fix for boot-looping computers was available within a couple hours, people were fixing the file, and while it was annoying to manually deploy everybody else had their core systems done in a day or two. The first day CrowdStrike should absolutely be liable. The second day it becomes a sliding window, as the other companies were able to restore operations, it was a race that says more about the IT department of the company and less about CrowdStrike's deployment.

          By the end of day two it's hard to see much liability for CrowdStrike.

          The flip side of the coin is Delta's own policies for people affected by their shutdown. They're demanding from CrowdStrike what they refuse to give their own customers for Delta's part of the outage. While that's not evidence toward CrowdStrike liability, it is relevant toward actual damages Delta faced and therefore damages CrowdStrike may have to pay.

          I would absolutely love it if the damages were issued in the form of a voucher for future CrowdStrike purchases, the same thing Delta offers so many of its customers after Delta damages flyer's property and costs fortunes in changed travel plans. "Yes CrowdStrike is at fault, and we apologize for any plans that may have been affected. Here's a voucher towards future CrowdStrike goods and services that must be used within 365 days. Thank you for using Falcon."

        • When they reject claims for which they have responsibility, how can they go after their contractors on the same basis?

          I don't know why they decided to reject those claims, but I'm pretty sure I know why they're going after Cloudstrike this way. The suit is being put together by lawyers and they're going to start off demanding everything they can think of so that they can use the more outlandish claims as bargaining chips.
        • Lawyers dont just come at these things swinging, they are going to have a theory of litigation and theres almost certainly been a trail of backroom negotiation that have fallen apart and now the lawyers are going to start taking skulls. If that theory of litigation is not structured around failing processes than Crowdstrike wont really have any reasonable grounds to start demanding to know about that.

          Regardless, an organization as big as Delta is very likely to be following pretty standard ISO change manage

    • by gweihir ( 88907 ) on Saturday October 26, 2024 @02:43PM (#64896123)

      Crowdstrike committed negligence that could not be more gross. In that case, all contractual limitations go out the window. You have to at least get the very basics right when you sell a product or service. Crowdstrike did not.

    • Major corporations don't sign or agree to standard EULAs when they purchase software. They always negotiate the terms.

  • by crunchy_one ( 1047426 ) on Saturday October 26, 2024 @08:17AM (#64895649)
    I would not like to be the person who authored the bad patch at CrowdStrike. Not one bit. I imagine their name will come out at some point during the litigation of this case, and at that point their career and any semblance of a decent life will be over. Never mind that they worked for a company that clearly had just about the worst release process, if you can call it that, on the planet. This situation calls for a scapegoat and that lone coder will be it.
    • It's possible that the defendant's lawyers may try to cast shadow on the coder, but the court is likely to consider all aspects of the event, including very poor processes by the company at large. After all, any revision and release processes need to take into account problematic or dangerous updates, and so while you might be able to blame a coder or a code group of buggering it up, the update was green lit, suggesting there are severe systemic issues.

    • by bjoast ( 1310293 ) on Saturday October 26, 2024 @09:21AM (#64895727)
      This guy's name may come out, yes, but I think you are exaggerating the industry's perception of who was responsible for this disaster. This was caused by a systemic problem linked to CrowdStrike's inadequate integration testing, not just the result of one developer's mistake. Based on the information available, most people should understand that there exists no reason to blacklist him from the profession, as you seem to believe.
    • I imagine their name will come out at some point during the litigation of this case, and at that point their career and any semblance of a decent life will be over.

      Only truly stupid companies wouldn't hire the person. A person is incapable of making a mistake like this. A mistake like this requires a fundamental lack of quality control systems and processes. There is a name you can directly blame. George Kurtz, CEO and Founder of a company who demonstrated that they don't take any basic QA/QC precautions.

    • "I would not like to be the person who authored the bad patch at CrowdStrike."

      I would not like to be the person who authored a parser with no input validation,
      I would not like to be the person who put code with no error handling in an environment where an error can bring down the entire machine.

      There are many things that had to go wrong, and a bad patch is neither necessary nor sufficient to cause this scenario.

    • Most software development teams in 2024 have plenty of change management processes to ensure that a single incompetent or rogue developer can't single-handedly push an update that breaks the system. It's extremely common for all software changes (and yes, that includes config) to require approval from multiple other developers, pass many unit tests, and in some cases pass automated integration tests. In addition to that, the company could have deployed the changes in waves instead of all at once. A failu
    • by gweihir ( 88907 )

      This was not the fault of a single person. This was grossly inadequate processes, lack of risk management and general lack of technological skill on the level of the organization. This is typically called organizational failure. Any halfway competent IT process expert will see it that way, as will any halfway competent IT risk manager.

    • It was only a matter of time before something in crowdstrike turned it into a giant catastrophe. Any time a single piece of software tries to do this many things at once, with minimal user interaction, it's bound to go sideways. It doesn't matter who's patch did it, if it hadn't been that one we would have seen one do it soon enough.

      The best possible outcome of this would be for crowdstrike to be forced into complete bankruptcy and liquidation. Crowdstrike is a total disaster, and was before this - i
    • I'm not even in the industry, but all industries and human nature are the same. Comments like "I would not like to be the person who authored the bad patch at CrowdStrike" might or might not be valid. He might indeed get wrung out to dry by corporate execs as the scapegoat for their own screwups, so yes, not fun to be him. But that does not inherently impeach him.

      Code is complex. Sure, there might be some people or shops with a habit of incompetence, but even the best programmers and places can have ina

    • The problem wasn't someone writing a bad patch:

      • Some computer in the deployment chain crashed at a critical point after sectors had been allocated for a file but before the content had been written. This meant the allocated sectors were zero-filled.
      • When the system came back up, it saw that the file was present and continued as though it had been written.
      • There was no verification at any point in the process, so the zero-filled content was pushed out and loaded by clients.

      It's essentially the same problem yo

  • I'm talking to you, Delta.

  • But it was everyone. Crowdstrike was clearly the problem.
    • (This was meant in reply to the comment above saying Delta should have tested it)
    • by evanh ( 627108 )

      Everyone should be testing before deploying. That fact that it hit so many in modern times shows up the bad practice creeping in.

      • https://www.crn.com/news/secur... [crn.com]

        Bad practices that vendors force on you.

        • Perhaps, but when vendors force practices that are egregious enough, I start looking for a new vendor.
        • Bad practices that vendors force on you.

          counterpoint: Delta chose a vendor that provided direct updates.

          It used to be standard practice to employ an admin who's job was to test updates before rolling them out to production machines. This costs money (for staff and a test environment) and slows deployment.

          A decision was made that the speed and efficiency of direct updates was more important than the risk of a bad update from the vendor.

          • That's not exactly a counterpoint when the argument is that executives are responsible because of executive decisions.

      • Testing what? It's not some niche application that fell apart under narrow conditions. All the customers did was "draw water from the well," and you can't expect people to test the well every time they dip a bucket.
    • by gweihir ( 88907 )

      Not everyone. Everyone incompetent enough to rely on Microsoft products for critical functions.

  • One could guess that the EULA says the software does not have to work and you use it at your own risk.

    https://www.crowdstrike.com/en... [crowdstrike.com]

    • Which is great! We can strike EULA's out of law, as they should be. A contract you can't negotiate isn't a contract.
      • by hwstar ( 35834 )

        A Contact you can't negotiate is called a "contract of adhesion"

        "For a contract to be treated as a contract of adhesion, it must be presented on a standard form on a "take it or leave it" basis, and give one party no ability to negotiate because of their unequal bargaining position."

        https://en.wikipedia.org/wiki/Standard_form_contract#Contracts_of_adhesion

        This is what we see for most consumer contracts.

        That said, if two companies are large enough they will negotiate a custom contract. Usually the largest co

        • Honest question: does the law state that it's legal for the creator of the contract of adhesion to alter the terms whenever they want as a take-it-or-leave-it opt-out agreement and that all changes are legally binding (assuming they don't violate any other laws)?
          • by hwstar ( 35834 )

            I don't know. I'm an engineer, not a lawyer. I worked for a fortune 500 company before retiring so I got familiar with the processes they use.

      • by gweihir ( 88907 )

        Ever heard of gross negligence? You are always responsible for that, no matter what you put into the contract.

    • 6. No Warranty.

      6.1 Disclaimer. THE SOFTWARE AND ALL OTHER CROWDSTRIKE OFFERINGS ARE PROVIDED “AS-IS” AND WITHOUT WARRANTY OF ANY KIND. CROWDSTRIKE AND ITS AFFILIATES DISCLAIM ALL OTHER WARRANTIES, WHETHER EXPRESS, IMPLIED, STATUTORY OR OTHERWISE.

      Sounds fairly straight forward. Did Delta not read the agreement?

      • > Sounds fairly straight forward

        That's not how the law works.

        Unconscionable terms and acts of fraud and gross negligence are frequently found to be outside of contract terms.

        It's far from straightforward. Partially self-serving on the part of lawyers.

      • They can disclaim statutory warranties all they want, but the law says that they still apply.
      • by gweihir ( 88907 )

        Unlike you, Delta knows what gross negligence is.

    • There is no way they just clicked "yes" to the EULA. Major corporate software deals are always negotiated, and reviewed by lawyers. The rules that apply to us little people, don't apply to the big fish.

  • Forced reboots are why many people use unsupported operating systems as they no longer reboot for updates disrupting workflows.
    • Forced reboots are why many people use unsupported operating systems as they no longer reboot for updates disrupting workflows.

      Microsoft doesn't force any reboots. Microsoft displays a warning on the screen that during the next reboot or shutdown an update will apply. Microsoft provides an offline timer to apply updates out of hours when the PC isn't use. Microsoft provide APIs that allow state recovery on reboot, as well as APIs to lock out reboots preventing work being lost.

      If you are losing work you definitely have someone to blame, but it's not Microsoft. Start with your own IT department (yes ours forces reboots on our machine

    • If your systems are so fragile you can't reboot for an update without going offline you need to look more closely at your systems.

      That's called a "single point of failure" and is considered a "bad thing".

      At early startups, sure, we crossed our fingers and had plenty of outages but as the companies grew and matured we eliminated every spof we knew about or stumbled into the hard way.

      Why are mature companies running known spof critical systems?

  • How did that Clownstrike failure sneak into Delta's backups/snapshots? Asking for a friend.

    • by gweihir ( 88907 )

      Thel that friend to find out what actually happens and until he does that to stop asking stupid questions.

  • Maybe the worst thing is that CS's practices came AS A COMPLETE SURPRISE to everyone,

    How many other companies out there are you currently relying on? What are their practices?

    Delta, like many, took a decision to trust CS and whatever claims CS made about their products. Well, trust works for the good guys. But it's the bad guys that are the problem. And as the old Marx Brothers joke goes, if you can fake that then you've got it made. The bad companies are deceiving you into blindly trusting them.

    The only an

  • CrowdStrike was at fault, but Delta owns the blame here. Delta had no way to recover in a timely manner; Delta didn't demand a Test QA before promoting changes to Prod. A $5 "break glass" USB could have been in a ziplock-type bag attached to each PC to boot to a base image and get back online; there are dozens of remote management recovery solutions available.

    • As others have noted, a bunch of old sysadmins appraising things in terms of uptime are not doing the same thing as a group of lawyers examining a contract.

  • They are both Delaware corporations, I would think the first thing CrowdStrike is going to do is point out this obvious fact. Delta is obviously going for "Home Field Advantage" since their headquarters is in Atlanta, but the correct state for this suit should be Delaware.
  • ... would be for crowdstrike to go completely bankrupt and enter immediate liquidation. Crowdstrike has been a disaster much longer than many people realize; this problem was just waiting to happen. This needs to go quickly into the trashbin of digital history along with Microsoft Bob and so many other bad ideas. Hopefully the correct lessons will be learned from this problem, though somehow I doubt it.
  • This airline has gotten cheap. Buying old-ass planes and not updating them, refusing to help customers, and their business is susceptible to a single point in technology failure. It's not the first time it's happened int he past several years. Only reason I still fly with them is status and they don't fly a lot of tiny-ass CRJs (American, ahem) that can't hold a roll-aboard. United is looking more appealing nowadays.

  • Cowboys exist everywhere, the building trades, and even in software development.

  • Microsoft's actions potentially violated multiple criminal laws: Criminal Code Section 430 (Mischief): - Willfully rendering computer data useless (that all-zeros file) - Interfering with lawful computer use - Creating danger to life (hospitals, airlines, infrastructure) - Failing basic duty of care in software deployment Criminal Code Section 346 (Extortion): - Forcing users to accept potentially damaging updates - Creating a situation requiring payment to restore function - H

If imprinted foil seal under cap is broken or missing when purchased, do not use.

Working...