Following the widespread tech outage caused by last summer’s CrowdStrike update, Microsoft is taking steps to prevent a similar disaster in the future.
Nearly a year after the incident, Microsoft has introduced the Windows Resiliency Initiative, a set of architectural changes designed to eliminate single points of failure that contributed to the severity of the CrowdStrike-triggered collapse.
In response to the outage, Microsoft organized a security summit late last year, bringing together CrowdStrike and leading competitors in the endpoint protection industry. A major outcome of the summit is the plan to move third-party security software out of the Windows kernel.
The kernel is the core component of any operating system. While most applications run in user mode and can fail without impacting the rest of the system, kernel-mode software failures can bring down the entire OS. Microsoft now intends for antivirus and endpoint protection tools to operate in user mode, where crashes are isolated and less disruptive.
The first phase of this change will begin in July with a private preview offered to selected partners already enrolled in Microsoft’s Virus Initiative 3.0.
Microsoft’s announcement included endorsements from several security partners, such as Bitdefender, ESET, SentinelOne, Trellix, Trend Micro, WithSecure, Sophos, and CrowdStrike.
The move is part of a broader strategy to enhance the stability of Windows in the wake of last year’s crisis.
On July 19, 2024, a faulty software update released by CrowdStrike caused an estimated 8.5 million Windows devices to go offline. Though the affected systems made up less than one percent of all Windows machines, the outage disrupted critical infrastructure across the globe, including banking, healthcare, and transportation sectors.
The fallout was severe. Airlines such as United, American, Spirit, and Frontier were forced to ground flights. Delta Airlines later filed a lawsuit against CrowdStrike, claiming $500 million in damages caused by the system failure.
CrowdStrike acknowledged in its preliminary Post Incident Review that its content validation software failed to detect the flaw. The company admitted it relied too heavily on past performance and pledged to improve its processes moving forward.
Found this article interesting? Follow us on X(Twitter) ,Threads and FaceBook to read more exclusive content we post.