Headline
The Perils of Ignoring Cybersecurity Basics
The massive outage involving a faulty Falcon update is an excellent illustration of what happens when organizations neglect security fundamentals.
Source: Tero Vesalainen via Alamy Stock Photo
UPDATE
Back in July, 8 million Windows devices around the world went offline after CrowdStrike released a threat intelligence update with a buggy content validator. Hospitals could not access patient records, interrupting patient care. Airlines were forced to delay or cancel thousands of flights. Some payment platforms were unavailable, resulting in people not being paid on time. The Emergency Alert System in the United States was affected, which, in turn, disrupted 911 services in several states.
The problem ultimately boiled down to an inadvertent systems failure, intensified by poor update management issue and processes that violated third-party risk management policies and procedures. CrowdStrike’s quality-control testing did not catch the bug beforehand. The outage highlighted what happens when basic IT rules are forgotten, ignored, or simply abbreviated.
Cloud-based endpoint detection and response (EDR) security tools, such as CrowdStrike’s solution, work best when the sensors can process real-time intelligence from the cloud, says Eric O’Neill, a cybersecurity consultant and former undercover FBI counterintelligence operative, noting that this was mainly a update management issue. Ideally, a vendor would roll out updates to a subset of its customers, then continue the rollout in stages to ensure there were no issues. In this case, he says, all customers received the update at the same time. From a third-party risk management perspective, organizations should test updates they receive before deploying them to their systems.
In this case, most CrowdStrike customers opted for the popular automatic update installation instead of the more complex and time-consuming staged rollout, which is rarely done for endpoint applications. Because such an anomaly has never happened before with an update, the decision to forgo testing was understandable, O’Neill notes. In light of this incident, he expects to see major changes in how organizations roll out and install updates in the future.
Crowdstrike takes exception with O’Neill’s assessment that testing was incomplete, and noted in a statement to Dark Reading that, “as stated in the Root Cause Analysis report: A stress test of the IPC Template Type with a test Template Instance was executed in our test environment, which consists of a variety of operating systems and workloads. The IPC Template Type passed the stress test and was validated for use, and a Template Instance was released to production as part of a Rapid Response Content update.”
Reducing the Risk
John Young, a consulting CISO and former cloud and data center executive at IBM, likens the impact of the unintended outage to previous cyberattacks on SolarWinds and Kaseya but without the malicious intent, as with ransomware and other malware. Instead, this became an eye-opening event for boards to ensure they are conducting appropriate business risk and interruption analyses. Here, only one operating system (OS), Windows, was affected. Organizations could reduce their vulnerabilities if they spread their operational risk over multiple OSes, he says.
“If we use different operating systems [for hot backup systems], we could run it at 25% service delivery level,” Young says. “We’d limp along, but we would have a real-time objective that we would recover in two days.”
Young likens this approach to enterprises having servers around the world that run multiple OSes, so that companies can protect themselves from regional threats and vulnerabilities. While running multiple OSes could protect against similar, OS-specific vulnerabilities, the arguments against it are the high cost and the unlikeliness of such an event occurring again, he adds.
While it makes sense to trust key software vendors, Young notes, basic security practices indicate that software should not be trusted simply because it is from a known source and is identified as a security update. Many of the system failures were because “they didn’t really follow best practices. There was no compartmentalization. There was no business continuity planning. There was no impact analysis on the critical system,” he says. “There was too much integration with their third party.”
The Impact on Cyber Insurance
While the outage clearly was not a cyberattack, some cyber insurance policies could include coverage for dependent systems failures that are not brought on by a malicious attacker, says David Anderson, vice president of cyber liability at Woodruff Sawyer, a national insurance brokerage. While addressing insurance coverage, in general, he says a property policy might address such losses, but it depends on the negotiated policy, any extra coverages the company might have selected, and the policy’s specific language.
“A system failure event is absolutely different than a network interruption or business interruption event, which is always tied to a malicious attack,” Anderson says. “It’s important to know that not every cyber insurance policy affirmatively includes system failure coverage; you have to have purchased the enhancement in order for this event to be covered.”
This alone could get the attention of general counsels or whichever corporate executive is responsible for their company’s cyber insurance policy. While not all incidents are always covered — generally, that is based on the severity of the incident, the amount of loss, and the amount of time the company was affected — this could be a watershed moment for an organization to reevaluate its existing insurance policies.
What would be an interesting question, he notes, is: Does a property policy that clearly includes data processing equipment breakdown coverage, which are non-malicious events, have some coverage to include here? Larger commercial property policies often include human errors, errors and omissions, and unplanned failures coverage within the property policy.
“It all is going to depend if the coverage is considered mechanical breakdown, which I don’t think this would be, or if it was truly a human error and unplanned outage,” Anderson notes. Again, the final decisions will be up to the insurance companies, which could interpret the situation differently.
This story was updated at 3 p.m. ET on Oct. 9 to clarify that the catalyst for the CrowdStrike outage was not a software patch or bug fix, but rather an update to its threat intelligence engine with the latest threat signatures, known as Rapid Response Content.
About the Author
Stephen Lawton is a veteran journalist and cybersecurity subject matter expert who has been covering cybersecurity and business continuity for more than 30 years. He was named a Global Top 25 Data Expert for 2023 and a Global Top 20 Cybersecurity Expert for 2022. Stephen spent more than a decade with SC Magazine/SC Media/CyberRisk Alliance, where he served as editorial director of the content lab. Earlier he was chief editor for several national and regional award-winning publications, including MicroTimes and Digital News & Review. Stephen is the founder and senior consultant of the media and technology firm AFAB Consulting LLC. You can reach him at [email protected].