Dateline: July 21, 2024 – Greetings from a train to Washington DC where I hope to board an evening flight to SFO. Like tens of thousands of air travelers, I’ve been Crowdstruck! I left my house this morning relatively confident that my flight from JFK to SFO was on time (famous last words). When I got to the airport, I joined a long list of stranded passengers hoping to find some way to rebook flights that were canceled due to a bad Windows security patch. But what exactly is that and why did it happen? Let’s explore.
When Everyone Gets a Blue Screen of Death
BSOD, the blue screen of death, has become the universal metaphor for an unscheduled operational collapse. When it happens to your Windows machine, you know you’re in for a day that is only slightly more preferable than a root canal. When it happens systemically in the cloud… that’s something else entirely.
Welcome to the biggest lesson in business continuity and resiliency we’ve experienced (at least since two weeks ago, when CDK Global was hit by a ransomware attack preventing over 15,000 car dealers from writing sales contracts or lease documents for more than a week).
What happened? Apparently Crowdstrike pushed an inadequately-tested security update, which resulted in BSODs for tens of thousands of Microsoft desktops and servers across the globe.
Funnily enough (if there is anything funny about this), the fix is a reboot – which may or may not work, just like on your own computer – but there’s a catch: you may need to reboot up to 15 times. If that doesn’t work, you have to manually remove a configuration file (which is way more time consuming). Here are Microsoft’s Crowdstrike Remediation instructions if you’d like to try to fix it yourself.
What Can We Do To Prevent This In The Future?
There is absolutely nothing you or I can personally do to prevent stuff like this. But it is so likely to happen, I wrote a detailed blog about it almost a decade ago entitled, Data Doomsday Preppers, where I made this very serious subject a little bit more fun at the expense of the then popular “Doomsday Preppers” cable TV show. But my message was clear – it’s not a question of if, it’s just a matter of when.
This Wasn’t a Bad Guy, But What If It Was?
Someone at Crowdstrike made a mistake (albeit a big one). What if it wasn’t? What if a nation-state launched a coordinated attack aimed at taking down transportation, banking, and healthcare in the U.S. (or globally)? What would we do?
Facts in evidence says, we’ll do what I’m doing today – try to find a workaround. But, while this situation seems to just be inconvenient, the estimated costs, according to CNN, will top $1 billion. We are so obviously vulnerable to this kind of meltdown, it’s easy to imagine a version of this lasting weeks (instead of days) and costing tens of billions.
This raises the next, more important question: Are we prepared? The answer, obviously, is no. What do we need to do to prepare? Most of us are highly dependent on cloud infrastructure and best-in-class monopolistic digital service providers. We use them to run our digital lives, and with the exponential growth of AI platforms, we’re about to become even more dependent on centralized services with single points of failure. How should we prepare for the inevitable attack that will take some or all of it away?
Business Continuity & Resiliency
Everyone – not just businesses – needs an effective continuity and resiliency plan. Best practices call for a comprehensive approach that prioritizes digital readiness while also addressing broader threats. For digital preparedness, you’ll want to maintain robust data backup protocols, ensure redundancy across cloud services, and conduct regular cybersecurity training to mitigate the risk of cyber attacks. Additionally, a good plan should include responses to physical threats such as natural disasters, with established evacuation procedures and safe communication strategies. These plans need to be tested regularly through simulated disruptions to assess the operational effectiveness of both digital and physical response strategies, ensuring minimal downtime and sustained productivity in various crisis scenarios.
Most big businesses do this (along with active shooter drills), but most normal people don’t. You may argue, as I did above, that there is nothing any individual or business can do in preparation for being Crowdstruck – but all of us can do our part to best position ourselves and our businesses to mitigate risk. You know the cliché, “Security is a lot like oxygen: you tend not to notice it until you begin to run out of it, and then it’s all you think about.”
Author’s note: This is not a sponsored post. I am the author of this article and it expresses my own opinions. I am not, nor is my company, receiving compensation for it. This work was created with the assistance of various generative AI models.