Major IT outage affecting banks, airlines, media outlets across the world

rxxrc@lemmy.ml · edit-2 11 months ago

Major IT outage affecting banks, airlines, media outlets across the world

jedibob5@lemmy.world · 11 months ago

Reading into the updates some more… I’m starting to think this might just destroy CloudStrike as a company altogether. Between the mountain of lawsuits almost certainly incoming and the total destruction of any public trust in the company, I don’t see how they survive this. Just absolutely catastrophic on all fronts.

bdonvr@thelemmy.club · 11 months ago

The amount of servers running Windows out there is depressing to me

YTG123@sopuli.xyz · 11 months ago

>Make a kernel-level antivirus
>Make it proprietary
>Don’t test updates… for some reason??

CircuitSpells@lemmy.world · 11 months ago

I mean I know it’s easy to be critical but this was my exact thought, how the hell didn’t they catch this in testing?

grabyourmotherskeys@lemmy.world · 11 months ago

I have had numerous managers tell me there was no time for QA in my storied career. Or documentation. Or backups. Or redundancy. And so on.

u/lukmly013 💾 (lemmy.sdf.org)@lemmy.sdf.org · 11 months ago

Just always make sure you have some evidence of them telling you to skip these.

grabyourmotherskeys@lemmy.world · 11 months ago

There’s a reason I still use lots of email in the age of IM. Permanent records, please. I will email a record of in person convos or chats on stuff like this. I do it politely and professionally, but I do it.

Voroxpete@sh.itjust.works · 11 months ago

A lot of people really need to get into the habit of doing this.

“Per our phone conversation earlier, my understanding is that you would like me to deploy the new update without any QA testing. As this may potentially create significant risks for our customers, I just want to confirm that I have correctly understood your instructions before proceeding.”

If they try to call you back and give the instruction over the phone, then just be polite and request that they reply to your email with their confirmation. If they refuse, say “Respectfully, if you don’t feel comfortable giving me this direction in writing, then I don’t feel comfortable doing it,” and then resend your email but this time loop in HR and legal (if you’ve ever actually reached this point, it’s basically down to either them getting rightfully dismissed, or you getting wrongfully dismissed, with receipts).

morbidcactus@lemmy.ca · 11 months ago

Engineering prof in uni was big on journals/log books for cyoa and it’s stuck with me, I write down everything I do during the day, research, findings etc, easily the best bit of advice I ever had.

dan@upvote.au · 11 months ago

Permanent records, please.

The issue with this is that a lot of companies have a retention policy that only retains emails for a particular period, after which they’re deleted unless there’s a critical reason why they can’t be (eg to comply with a legal hold). It’s common to see 2, 3 or 5 year retention policies.

SLVRDRGN@lemmy.world · edit-2 11 months ago

Unless their manager works in Boeing.

Midnight Wolf@lemmy.world · 11 months ago

There’s some holes in our production units

Software holes, right?

…

…software holes, right?

Guy Ingonito@reddthat.com · 11 months ago

Move fast and break things! We need things NOW NOW NOW!

The Quuuuuill@slrpnk.net · 11 months ago

Push that into the technical debt. Then afterwards never pay off the technical debt

Voroxpete@sh.itjust.works · 11 months ago

Completely justified reaction. A lot of the time tech companies and IT staff get shit for stuff that, in practice, can be really hard to detect before it happens. There are all kinds of issues that can arise in production that you just can’t test for.

But this… This has no justification. A issue this immediate, this widespread, would have instantly been caught with even the most basic of testing. The fact that it wasn’t raises massive questions about the safety and security of Crowdstrike’s internal processes.

Midnight Wolf@lemmy.world · 11 months ago

most basic of testing

“I ran the update and now shit’s proper fucked”

buttfarts@lemy.lol · 11 months ago

That would have been sufficient to notice this update’s borked

madcaesar@lemmy.world · 11 months ago

I think when you are this big you need to roll out any updates slowly. Checking along the way they all is good.

Voroxpete@sh.itjust.works · 11 months ago

The failure here is much more fundamental than that. This isn’t a “no way we could have found this before we went to prod” issue, this is a “five minutes in the lab would have picked it up” issue. We’re not talking about some kind of “Doesn’t print on Tuesdays” kind of problem that’s hard to reproduce or depends on conditions that are hard to replicate in internal testing, which is normally how this sort of thing escapes containment. In this case the entire repro is “Step 1: Push update to any Windows machine. Step 2: THERE IS NO STEP 2”

There’s absolutely no reason this should ever have affected even one single computer outside of Crowdstrike’s test environment, with or without a staged rollout.

madcaesar@lemmy.world · 11 months ago

God damn this is worse than I thought… This raises further questions… Was there a NO testing at all??

kayos@lemmy.world · 11 months ago

Tested on Windows 10S

elrik@lemmy.world · 11 months ago

My guess is they did testing but the build they tested was not the build released to customers. That could have been because of poor deployment and testing practices, or it could have been malicious.

Such software would be a juicy target for bad actors.

Voroxpete@sh.itjust.works · 11 months ago

Agreed, this is the most likely sequence of events. I doubt it was malicious, but definitely could have occurred by accident if proper procedures weren’t being followed.

kaffiene@lemmy.world · 11 months ago

Yes. And Microsoft’s

wizardbeard@lemmy.dbzer0.com · 11 months ago

How exactly is Microsoft responsible for this? It’s a kernel level driver that intercepts system calls, and the software updated itself.

This software was crashing Linux distros last month too, but that didn’t make headlines because it effected less machines.

Mikina@programming.dev · 11 months ago

From what I’ve heard, didn’t the issue happen not solely because of CS driver, but because of a MS update that was rolled out at the same time, and the changes the update made caused the CS driver to go haywire? If that’s the case, there’s not much MS or CS could have done to test it beforehand, especially if both updates rolled out at around the same time.

Voroxpete@sh.itjust.works · 11 months ago

I’ve seen zero suggestion of this in any reporting about the issue. Not saying you’re wrong, but you’re definitely going to need to find some sources.

Qwaffle_waffle@sh.itjust.works · 11 months ago

Is there any links to this?

kaffiene@lemmy.world · 11 months ago

My apologies I thought this went out with a MS update

Mikina@programming.dev · 11 months ago

From what I’ve heard and to play a devil’s advocate, it coincidented with Microsoft pushing out a security update at basically the same time, that caused the issue. So it’s possible that they didn’t have a way how to test it properly, because they didn’t have the update at hand before it rolled out. So, the fault wasn’t only in a bug in the CS driver, but in the driver interaction with the new win update - which they didn’t have.

CircuitSpells@lemmy.world · 11 months ago

How sure are you about that? Microsoft very dependably releases updates on the second Tuesday of the month, and their release notes show if updates are pushed out of schedule. Their last update was on schedule, July 9th.

Mikina@programming.dev · 11 months ago

I’m not. I vaguely remember seeing it in some posts and comments, and it would explain it pretty well, so I kind of took it as a likely outcome. In hindsight, You are right, I shouldnt have been spreading hearsay. Thanks for the wakeup call, honestly!

dan@upvote.au · edit-2 11 months ago

You left out

>Pushed a new release on a Friday

jaemo@sh.itjust.works · 11 months ago

You left out > Profit

Oh… Wait…Hang on a sec.

areyouevenreal@lemm.ee · 11 months ago

Lots of security systems are kernel level (at least partially) this includes SELinux and AppArmor by the way. It’s a necessity for these things to actually be effective.

uis@lemm.ee · 11 months ago

You missed most important line:

>Make it proprietary

sasquash@sopuli.xyz · 11 months ago

never do updates on a Friday.

Encrypt-Keeper@lemmy.world · 11 months ago

Yeah my plans of going to sleep last night were thoroughly dashed as every single windows server across every datacenter I manage between two countries all cried out at the same time lmao

szczuroarturo@programming.dev · 11 months ago

I always wondered who even used windows server given how marginal its marketshare is. Now i know from the news.

Pringles@lemm.ee · 11 months ago

Marginal? You must be joking. A vast amount of servers run on Windows Server. Where I work alone we have several hundred and many companies have a similar setup. Statista put the Windows Server OS market share over 70% in 2019. While I find it hard to believe it would be that high, it does clearly indicate it’s most certainly not a marginal percentage.

jj4211@lemmy.world · 11 months ago

I’m not getting an account on Statista, and I agree that its marketshare isn’t “marginal” in practice, but something is up with those figures, since overwhelmingly internet hosted services are on top of Linux. Internal servers may be a bit different, but “servers” I’d expect to count internet servers…

catloaf@lemm.ee · 11 months ago

Most servers aren’t Internet-facing.

jj4211@lemmy.world · 11 months ago

There are a ton of Internet facing servers, vast majority of cloud instances, and every cloud provider except Microsoft (and their in house “windows” for azure hosting is somehow different, though they aren’t public about it).

In terms of on premise servers, I’d even say the HPC groups may outnumber internal windows servers. While relatively fewer instances, they all represent racks and racks of servers, and that market is 100% Linux.

I know a couple of retailers and at least two game studios are keeping at scale windows a thing, but Linux mostly dominates my experience of large scale deployment in on premise scale out.

It just seems like Linux is just so likely for scenarios that also have lots of horizontal scaling, it is hard to imagine that despite that windows still being a majority share of the market when all is said and done, when it’s usually deployed in smaller quantities in any given place.

Pringles@lemm.ee · 11 months ago

It’s stated in the synopsis, below where it says you need to pay for the article. Anyway, it might be true as the hosting servers themselves often host up to hundreds of Windows machines. But it really depends on what is measured and the method used, which we don’t know because who the hell has a statista account anyway.

Encrypt-Keeper@lemmy.world · 11 months ago

since overwhelmingly internet hosted services are on top of Linux

This is a common misconception. Most internet hosted services are behind a Linux box, but that doesn’t mean those services actually run on Linux.

Mjpasta710@midwest.social · 11 months ago

This is a crowdstrike issue specifically related to the falcon sensor. Happens to affect only windows hosts.

rottingleaf@lemmy.world · 11 months ago

Well, I’ve seen some, but they usually don’t have automatic updates and generally do not have access to the Internet.

marcos@lemmy.world · 11 months ago

It’s only marginal for running custom code. Every large organization has at least a few of them running important out-of-the-box services.

Delta_V@lemmy.world · 11 months ago

Not too long ago, a lot of Customer Relationship Management (CRM) software ran on MS SQL Server. Businesses made significant investments in software and training, and some of them don’t have the technical, financial, or logistical resources to adapt - momentum keeps them using Windows Server.

For example, small businesses that are physically located in rural areas can’t use cloud based services because rural internet is too slow and unreliable. Its not quite the case that there’s no amount of money you can pay for a good internet connection in rural America, but last time I looked into it, Verizon wanted to charge me $20,000 per mile to run a fiber optic cable from the nearest town to my client’s farm.

Encrypt-Keeper@lemmy.world · edit-2 11 months ago

Almost everyone, because the Windows server market share isn’t marginal at all.

Eril@feddit.org · 11 months ago

My current company does and I hate it so much. Who even got that idea in the first place? Linux always dominated server-side stuff, no?

catloaf@lemm.ee · 11 months ago

Yes, but the developers learned on Windows, so they wrote software for Windows.

GreyEyedGhost@lemmy.ca · 11 months ago

You should read the saga of when MS bought Hotmail. The work they had to do to be able to run it on Windows was incredible. It actually helped MS improve their server OS, and it still wasn’t as performance when they switched over.

Encrypt-Keeper@lemmy.world · edit-2 11 months ago

No, Linux doesn’t now nor has it ever dominated the server space.

TopRamenBinLaden@sh.itjust.works · 11 months ago

In university computer science, in the states, MS server was the main server OS that they taught my class during our education.

Microsoft loses money to let the universities and students use and learn MS server for free, or at least they did at the time. This had the effect of making a lot of fresh grad developers more comfortable with using MS server, and I’m sure it led to MS server being used in cases where there were better options.

jayandp@sh.itjust.works · 11 months ago

This is fine🔥🐶☕🔥@lemmy.world · 11 months ago

How many coffee cups have you drank in the last 12 hours?

Cryophilia@lemmy.world · 11 months ago

I work in a data center

I lost count

This is fine🔥🐶☕🔥@lemmy.world · 11 months ago

What was Dracula doing in your data centre?

kingthrillgore@lemmy.ml · 11 months ago

Because he’s Dracula. He’s twelve million years old.

THE WORMS

oo1@lemmings.world · 11 months ago

Surely Dracula doesn’t use windows.

jj4211@lemmy.world · 11 months ago

I work in a datacenter, but no Windows. I slept so well.

Though a couple years back some ransomware that also impacted Linux ran through, but I got to sleep well because it only bit people with easily guessed root passwords. It bit a lot of other departments at the company though.

This time even the Windows folks were spared, because CrowdStrike wasn’t the solution they infested themselves with (they use other providers, who I fully expect to screw up the same way one day).

Encrypt-Keeper@lemmy.world · 11 months ago

There was a point where words lost all meaning and I think my heart was one continuous beat for a good hour.

Boomer Humor Doomergod@lemmy.world · 11 months ago

Did you feel a great disturbance in the force?

Encrypt-Keeper@lemmy.world · 11 months ago

Oh yeah I felt a great disturbance (900 alarms) in the force (Opsgenie)

rottingleaf@lemmy.world · 11 months ago

How’s it going, Obi-Wan?

kadotux@sopuli.xyz · edit-2 11 months ago

Here’s the fix: (or rather workaround, released by CrowdStrike) 1)Boot to safe mode/recovery 2)Go to C:\Windows\System32\drivers\CrowdStrike 3)Delete the file matching “C-00000291*.sys” 4)Boot the system normally

richtellyard@lemmy.world · 11 months ago

This is going to be a Big Deal for a whole lot of people. I don’t know all the companies and industries that use Crowdstrike but I might guess it will result in airline delays, banking outages, and hospital computer systems failing. Hopefully nobody gets hurt because of it.

boaratio@lemmy.world · 11 months ago

CrowdStrike: It’s Friday, let’s throw it over the wall to production. See you all on Monday!

jayandp@sh.itjust.works · edit-2 11 months ago

^^so ^^hard ^^picking ^^which ^^meme ^^to ^^use

verity_kindle@sh.itjust.works · 11 months ago

Good choice, tho. Is the image AI?

PythagreousTitties@lemm.ee · 11 months ago

It’s a real photograph from this morning.

jayandp@sh.itjust.works · 11 months ago

Not sure, I didn’t make it. Just part of my collection.

verity_kindle@sh.itjust.works · 11 months ago

Fair enough!

lightnsfw@reddthat.com · 11 months ago

We did it guys! We moved fast AND broke things!

frezik@midwest.social · 11 months ago

When your push to prod on Friday causes a small but measurable drop in global GDP.

LustyArgonian@lemmy.world · 11 months ago

Actually, it may have helped slow climate change a little

iamtrashman1312@lemmy.world · 11 months ago

The earth is healing 🙏

For part of today

merc@sh.itjust.works · 11 months ago

With all the aircraft on the ground, it was probably a noticeable change. Unfortunately, those people are still going to end up flying at some point, so the reduction in CO2 output on Friday will just be made up for over the next few days.

lagomorphlecture@lemm.ee · 11 months ago

Definitely not small, our website is down so we can’t do any business and we’re a huge company. Multiply that by all the companies that are down, lost time on projects, time to get caught up once it’s fixed, it’ll be a huge number in the end.

frezik@midwest.social · edit-2 11 months ago

GDP is typically stated by the year. One or two days lost, even if it was 100% of the GDP for those days, would still be less than 1% of GDP for the year.

LustyArgonian@lemmy.world · 11 months ago

I know people who work at major corporations who said they were down for a bit, it’s pretty huge.

merc@sh.itjust.works · 11 months ago

Does your web server run windows? Or is it dependent on some systems that run Windows? I would hope nobody’s actually running a web server on Windows these days.

lagomorphlecture@lemm.ee · 11 months ago

I have a absolutely no idea. Not my area of expertise.

Jesus@lemmy.world · 11 months ago

They did it on Thursday. All of SFO was BSODed for me when I got off a plane at SFO Thursday night.

Riccosuave@lemmy.world · 11 months ago

merc@sh.itjust.works · 11 months ago

Was it actually pushed on Friday, or was it a Thursday night (US central / pacific time) push? The fact that this comment is from 9 hours ago suggests that the problem existed by the time work started on Friday, so I wouldn’t count it as a Friday push. (Still, too many pushes happen at a time that’s still technically Thursday on the US west coast, but is already mid-day Friday in Asia).

trolololol@lemmy.world · 11 months ago

I’m in Australia so def Friday. Fu crowdstrike.

merc@sh.itjust.works · 11 months ago

Seems like you should be more mad at the International Date Line.

NaibofTabr@infosec.pub · edit-2 11 months ago

Wow, I didn’t realize CrowdStrike was widespread enough to be a single point of failure for so much infrastructure. Lot of airports and hospitals offline.

The Federal Aviation Administration (FAA) imposed the global ground stop for airlines including United, Delta, American, and Frontier.

Flights grounded in the US.

The System is Down

iAvicenna@lemmy.world · 11 months ago

invisiblegorilla@sh.itjust.works · 11 months ago

Ironic. They did what they are there to protect against. Fucking up everyone’s shit

Telorand@reddthat.com · 11 months ago

Maybe centralizing everything onto one company’s shoulders wasn’t such a great idea after all…

Excrubulent@slrpnk.net · 11 months ago

Wait, monopolies are bad? This is the first I’ve ever heard of this concept. So much so that I actually coined the term “monopoly” just now to describe it.

tibi@lemmy.world · 11 months ago

Crowdstrike is not a monopoly. The problem here was having a single point of failure, using a piece of software that can access the kernel and autoupdate running on every machine in the organization.

At the very least, you should stagger updates. Any change done to a business critical server should be validated first. Automatic updates are a bad idea.

Obviously, crowdstrike messed up, but so did IT departments in every organization that allowed this to happen.

h0rnman@lemmy.dbzer0.com · 11 months ago

You wildly underestimate most corporate IT security’s obsession with pushing updates to products like this as soon as they release. They also often have the power to make such nonsense the law of the land, regardless of what best practices dictate. Maybe this incident will shed some light on how bad of an idea auto updates are and get C-levels to do something about it, but even if they do, it’ll only last until the next time someone gets compromised by a flaw that was fixed in a dot-release

Excrubulent@slrpnk.net · 11 months ago

Monopolies aren’t absolute, ever, but having nearly 25% market share is a problem, and is a sign of an oligopoly. Crowdstrike has outsized power and has posted article after article boasting of its dominant market position for many years running.

I think monopoly-like conditions have become so normalised that people don’t even recognise them for what they are.

joostjakob@lemmy.world · 11 months ago

Someone should invent a game, that while playing demonstrates how much monopolies suck for everyone involved (except the monopolist)

KingJalopy @lemm.ee · 11 months ago

And make it so you lose friends and family over the course of the 4+ hour game. Also make a thimble to fight over, that would be dope.

tektite@slrpnk.net · 11 months ago

Get your filthy fucking paws off my thimble!

Excrubulent@slrpnk.net · 11 months ago

I’m sure a game that’s so on the nose with its message could never become a commercialised marketing gimmick that perversely promotes existing monopolies. Capitalists wouldn’t dare.

Telorand@reddthat.com · 11 months ago

I mean, I’m sure those companies that have them don’t think so—when they aren’t the cause of muti-industry collapses.

jaybone@lemmy.world · 11 months ago

Yes, it’s almost as if there should be laws to prevent that sort of thing. Hmm

Excrubulent@slrpnk.net · edit-2 11 months ago

Well now that I’ve invented the concept for the first time, we should invent laws about it. We’ll get in early, develop a monopoly on monopoly legislation and steer it so it benefits us.

Wow, monopolies rule!

jaybone@lemmy.world · 11 months ago

The too big to fail philosophy at its finest.

nintendiator@feddit.cl · 11 months ago

Since when has any antivirus ever had the intent of actually protecting against viruses? The entire antivirus market is a scam.

StaySquared@lemmy.world · 11 months ago

CrowdStrike has a new meaning… literally Crowd Strike.

uis@lemm.ee · 11 months ago

They virtually blew up airports

AnUnusualRelic@lemmy.world · 11 months ago

An offline server is a secure server!

CanadaPlus@lemmy.sdf.org · 11 months ago

Honestly my philosophy these days, when it comes to anything proprietary. They just can’t keep their grubby little fingers off of working software.

At least this time it was an accident.

Hotzilla@sopuli.xyz · 11 months ago

There is nothing unsafer than local networks.

AV/XDR is not optional even in offline networks. If you don’t have visibility on your network, you are totally screwed.

Damage@feddit.it · 11 months ago

The thought of a local computer being unable to boot because some remote server somewhere is unavailable makes me laugh and sad at the same time.

recapitated@lemmy.world · 11 months ago

Clownstrike

lando55@lemmy.world · 11 months ago

Crowdshite haha gotem

WhatAmLemmy@lemmy.world · 11 months ago

CrowdCollapse

aaaaace@lemmy.blahaj.zone · 11 months ago

https://www.theregister.com/ has a series of articles on what’s going on technically.

Latest advice…

There is a faulty channel file, so not quite an update. There is a workaround…

Boot Windows into Safe Mode or WRE.
Go to C:\Windows\System32\drivers\CrowdStrike
Locate and delete file matching “C-00000291*.sys”
Boot normally.

ArcaneSlime@lemmy.dbzer0.com · 11 months ago

For a second I thought this was going to say “go to C:\Windows\System32 and delete it.”

I’ve been on the internet too long lol.

Sarmyth@lemmy.world · 11 months ago

Yeah I had to do this with all my machines this morning. Worked.

LadyAutumn@lemmy.blahaj.zone · 11 months ago

Working on our units. But only works if we are able to launch command prompt from the recovery menu. Otherwise we are getting a F8 prompt and cannot start.

Malfeasant@lemm.ee · 11 months ago

Bootable USB stick?

LadyAutumn@lemmy.blahaj.zone · 11 months ago

Yeah some 95% of our end user devices (>8000) have the F8 prompt. Logistics is losing their minds about the prospect of sending recovery USBs to roughly a thousand locations across NA.

Major IT outage affecting banks, airlines, media outlets across the world

Major IT outage affecting banks, airlines, media outlets across the world

Live: Major IT outage affecting banks, airlines, media outlets across the world