2 Strategic Imperatives To Meet Modern IT Resiliency Needs
How would you answer the following questions:
- What is the risk to your business if your IT environment can’t quickly recover from an outage?
- During your last major disruption, did you even consider using your Disaster Recovery solution?
- If you were to be attacked by a destructive malware or ransomware, is your data and infrastructure protected and recoverable?
- How resilient are your applications and IT services having gone through drastic changes in our world and how the business relies on IT?
Modern business relies on IT and expect services it offers to be always available and resilient. Despite adoption of advanced technologies, enterprises continue to struggle to maintain higher levels of continuity of critical business functions and real-life examples have shown eroding confidence in recovery. Compelling events such as fires, floods, hurricanes, terrorism, technology disruptions and pandemics are not new to resiliency professionals, but have provoked the need to rethink traditional strategies and serve as a new impetus to organizations looking to prepare for the future.
As highlighted in the Gartner report 1, 2020 Strategic Roadmap for Business Continuity Management, and our direct experience with customers, it is now imperative to provide 24/7 IT services to both internal business stakeholders and to end customers. The success of your business depends on it. To achieve this, it is critical that business continuity planning enables the organization to minimize interruptions to normal daily operations by focusing on overall resiliency versus traditional operational silos. This plan must account for a wide variety of interruption scenarios including local outages and ever-evolving cyber attacks.
Imperative #1: Application Continuity
Data from the Dell Technologies Global Data Protection Index (2019) and our direct experience with customers has shown, CIOs have not leveraged their existing Disaster Recovery capabilities when a significant outage (site-wide or high impact) has occurred, leaving them exposed for these key reasons:
- Traditional monolithic architectures create a complex, highly inter-dependent stack that is hard to manage and lacks granular resiliency at the application layer.
- Lack of confidence in the organization’s capability to perform a full-site recovery due to complexity and limitations of traditional “bubble” testing and validation.
- Disaster Recovery capacity is not adequate, and true application and infrastructure dependencies are unknown including external connectivity to B2B and supply chain.
- Declaring a disaster would also mean having to failback applications and infrastructure back to production later. This requires a separate outage and added complexity
For many customers, the complexity of recovering at a Disaster Recovery site outweighs the time required to remediate the production site outage and resume operations. This leads to a very poor return on investment for Disaster Recovery assets which cannot be justified in the modern era when CIOs are being asked to cut costs and optimize productivity across all IT assets, inclusive of Disaster Recovery datacenters and infrastructure. For an IT organization, this situation is untenable.
From a technology perspective, most traditional data centers are designed for stability and redundancy. The goal is to ensure services are never interrupted and applications are never down. This model has never been realistic. Monolithic architectures and accidental growth over time creates a complex, highly interdependent stack that is hard to manage and lacks granular resiliency. Instead of following this defunct specification, modern data centers should be designed with resiliency and recovery in mind. The goal is to provide a fault tolerant fabric of infrastructure, platform resources and availability zones that applications and data flow across in order to meet the performance, capacity, and availability requirements of the enterprise. By decoupling from monolithic architectures, we end up with layered abstractions where each component of the infrastructure provides a “service” to the components above it in the stack. This always-on architecture handles the changing needs of the application by adding more “resources” to meet the dynamic application needs either via scaling up or out. For example, when new VMs are provisioned with Dynamic DNS with DHCP, IP services are automatically provisioned while software-defined load balancers create access to these new resource pools.
In the event of failure, chatter, performance spikes, and cyber-attacks, availability zones isolate the damage by controlling or limiting the blast radius to only the affected zone or even container while failing over critical applications to alternate datacenters. A key capability that enables this is Application Continuity.
Application Continuity is enabled by conducting a deeper analysis of IT and applications and creatively integrating multiple platforms and capabilities which results in more relevance than Disaster Recovery. It is the ability to move business functions tied to predetermined groups of applications (application packages) from one site to another, or to public cloud, to mitigate planned and unplanned outages. At any given time, applications can be distributed actively across all multi-cloud environments which reduces the impact from a single site outage as well as making better use of all IT assets across the enterprise. In the graphic below, management of traditional monolithic architecture siloes is converted to a modern API and platform-based IT service framework that allows for Application Continuity to be achievable.
Customers that have implemented Application Continuity enjoy the following benefits:
- Active use of all IT assets and effectively leverage multi-cloud
- Better prepared for planned and unplanned outages especially during compelling events such as pandemics
- Proactively failover applications periodically from one site to another; run for a period and then failback leading to increased confidence in recovery. In highly regulated industries such as financial services, this capability meets the newer guidelines of regulators looking for real evidence
- Antiquating the “Disaster Recovery Test in the bubble” that is limited in scope, expensive, time consuming and resource intensive
Imperative #2: Cyber Recovery
In a typical enterprise, applications, systems and data are highly interconnected across production, disaster recovery sites, public clouds and data bunkers. Security teams have a herculean effort to prevent malware from coming in with advanced detection, firewalls and other techniques. Unfortunately, the hackers have also become more sophisticated and advanced phishing attacks are still unstoppable, allowing malware to enter the IT environment and dwell for a long period of time. Once command-and-control is established, hackers can conduct thorough reconnaissance and execute a destructive attack that can compromise the network, data and backups. Unfortunately, numerous cyber-attacks in the past few years have shown that this is a common occurrence and all industries are susceptible.
With a growing remote workforce and loosening of tight controls, it is harder for security professionals to sustain the strong defense and a disappearing perimeter caused by public cloud, mobile and remote access creates more points of exposure and increases vulnerabilities to a cyber-attack.
A modern approach to Cyber Recovery is acknowledge that a cyber-attack is inevitable despite the best defenses, and to build an “air-gapped”, “fail-safe” copy of mission and business critical data. The Cyber Recovery Solution is designed to protect an organization from emerging cyber-attacks that include: data destruction, data manipulation, and encrypted data held for ransom. An effective Cyber Recovery design reduces an organization’s risk and ensures that a fail-safe copy of data is always available and provides the documented processes and procedures to protect and recover from the air-gapped vault post an incident.
After learning from numerous watershed attacks in the past few years, we have come to understand the not all attacks are alike and different threat vectors make up the wide range of cyber-attacks such as persistent dormant malware, data wiping, data locking, server disable, insider attack, backups compromised and data theft.
A cyber recovery solution is the last line of defense and it is carefully designed to mitigate the risks of proliferating threat vectors by employing some key uncompromisable characteristics that makes it unique and different from traditional data protection or disaster recovery solutions.
Organizations that have implemented a cyber recovery solution typically leverage the vaulted data to perform advanced analytics and forensics in the vault which consists of inspecting the data coming into the vault every day and leveraging AI/ML techniques to detect indicators of compromise. Another key aspect to the cyber vault is creating advanced data recovery plans and integration with incident response, making it part of the overall enterprise wide resiliency program.
- Disruptive events are on the rise, with even more organizations falling victim in 2019 than in previous years, resulting in a wide range of consequences beyond financial damage
- There is also a considerable lack of confidence in the ability to recover today and widespread concern that even more disruption will be experienced over the next 12 months
- Organizations need to pay attention to their Business Resiliency Plan more closely than ever and focus on integrating new capabilities such as Application Continuity and Cyber Recovery to protect data and “crown jewels” and enable recovery from a variety of outages
- Organizations that can demonstrate a higher level of IT resilience will use it as a competitive advantage and gain the confidence of their boards and customers and be relevant in the future
Please comment below with any questions or for clarifications.