Headline
Zscaler's Cloud-Based Cybersecurity Outages Showcase Redundancy Problem
While fewer cloud providers are suffering outages, customers should prepare for the uncommon event, especially when relying on cloud services for security.
When cybersecurity software-as-a-service offerings are impacted by an outage, it can result in a significant disruption, especially if the service protects business-critical portions of a firm’s infrastructure.
In the past two weeks, for example, outages at cybersecurity services firm Zscaler resulted in latency, packet loss, and outages for some businesses. On Oct. 25, a traffic-forwarding issue caused disruptions and packet loss to some of the cybersecurity provider’s regional customers. The previous week, the company warned clients that they could experience packet loss due to damage to a transoceanic cable near France.
Unfortunately, many companies weren’t prepared. “Took down entire company for several hours this morning,” one IT security engineer said on Twitter of the impact on his firm.
While outages of any cloud service can have a dramatic impact on business operations — an Amazon Web Services outage in December 2021 took down swaths of the web for hours, for example — cyberdefenses often have a favored position within a company’s infrastructure, making an outage more impactful. A business software-as-a-service (SaaS) application typically only impacts its own user base, whereas cybersecurity SaaS services can often have impacts across applications.
Companies need to be prepared for such outages, even if rare, says Merritt Maxim, vice president and research director for security and risk at Forrester Research.
Zscaler “is not the first cloud outage and won’t be the last, either,” he says, adding: “Cloud products and services are not necessarily more susceptible to outages than on-prem equivalents, [but] the issue is often that because of this perception, organizations do not properly assess all possible causes of cloud outages and develop mitigation plans in response to these threats.”
Indeed, the Zscaler incidents are not unique. In October 2020, an outage affected Microsoft Azure AD, the company’s SaaS identity and access management service, blocking businesses and users from connecting to their applications. A year later, a six-hour Facebook outage blocked many users — including some businesses — from using the company’s single sign-on technology and slowed many websites when scripts relying on the company’s service failed to run.
Cloud Outages Increasingly Rare, But Impactful
Overall, there has been a reduction in the number of significant outages among cloud services, with 60% of operators saying that they have had an outage in the past three years, down from 78% in 2020, according to survey results published by the Uptime Institute.
And compared with on-premises cybersecurity software and appliances, cloud-based services are nonetheless more reliable, says Jim Reavis, CEO of the Cloud Security Alliance, an industry organization.
“Mature cloud providers, including those with cybersecurity offerings, operate much like public utilities that provide on-demand services to large customer bases,” he says. “They are generally very reliable, which is why their intermittent failures, like public utilities, are newsworthy.”
For that reason, the two disruptions weathered by Zscaler stand out — as does companies’ unpreparedness for them.
“Things will happen as long as humans and nature are involved,” says Misha Kuperman, senior vice president of cloud operations and ecosystem at Zscaler. He adds, “With the right tools, we can deliver more reliability and work around such incidents, as we have done on multiple occasions.”
Building in Cloud Security Redundancy
Businesses also should ensure that they realize the cloud security and reliability is a shared responsibility model. Cloud providers — including cybersecurity services based in the cloud — are responsible for their infrastructure, but companies should architect their cloud or hybrid infrastructure to handle outages.
Companies that know their cloud vendor’s architecture will be better prepared for outages, says CSA’s Reavis.
“It is important to understand how the provider achieves redundancy in its architecture, operating procedures including software updates, and its footprint of global data centers,” he says. “The customer should then understand how that redundancy satisfies their own risk requirements and if their own architecture takes full advantage of the redundancy capabilities.”
Business customers should also regularly re-evaluate their technology landscape and risk profile. While network and power failures — and natural disasters — used to dominate resilience discussions, malicious threats such as ransomware and denial-of-service attacks are often more likely to dominate discussions today, says Forrester’s Maxim.
“Understanding the evolving risks and their respective impact on business is an imperative to prioritize the risks that you’d want to mitigate,” he says. A business impact analysis conducted with internal teams “allows you to create tiers for application criticality that help determine if the default resilience of cloud services is enough — or if you need a more robust solution.”