System crashes. Outages. Downtime.
These words send chills down the spines of network administrators. When business apps go down, business leaders are not happy. And the cost can be significant.
Recent IDC survey data shows that enterprises experience two cloud service outages per year. IDC research conservatively puts the average cost of downtime for enterprises at $250,000/hour. Which means just four hours of downtime can cost an enterprise $1 million.
More Info: 300-425: Designing Cisco Enterprise Wireless Networks (ENWLSD)
To respond to failures as quickly as possible, network administrators need a highly scalable, fault tolerant architecture that is simple to manage and troubleshoot.
What’s Required for the Always On Enterprise
Let’s examine some of the key technical capabilities required to meet the “always-on” demand that today’s businesses face. There is a need for:
1. Granular change control mechanisms that facilitate flexible and localized changes, driven by availability models, so that the blast radius of a change is contained by design and intent.
2. Always-on availability to help enable seamless handling and disaster recovery, with failover of infrastructure from one data center to another, or from one data center to a cloud environment.
3. Operational simplicity at scale for connectivity, segmentation, and visibility from a single pane of glass, delivered in a cloud operational model, across distributed environments—including data center, edge, and cloud.
4. Compliance and governance that correlate visibility and control across different domains and provide consistent end-to-end assurance.
5. Policy– driven automation that improves network administrators’ agility and provides control to manage a large-scale environment through a programmable infrastructure.
Typical Network Architecture Design: The Horizontal Approach
With businesses required to be “always on” and closer to users for performance considerations, there is a need to deploy applications in a very distributed fashion. To accomplish this, network architects create distributed mechanisms across multiple data centers. These are on-premises and in the cloud, and across geographic regions, which can help to mitigate the impact of potential failures. This horizontal approach works well by delivering physical layer redundancy built on autonomous systems that rely on a do-it-yourself approach for different layers of the architecture.
However, this design inherently imposes an over-provisioning of the infrastructure, along with an inability to express intent and a lack of coordinated visibility through a single pane of glass.
Some on-premises providers also have marginal fault isolation capabilities and limited-to-no capabilities or solutions for effectively managing multiple data centers.
For example, consider what happens when one data center—or part of the data center—goes down using this horizontal design approach. It is typical to fix this kind of issue in place, increasing the time it takes for application availability, either in the form of application redundancy or availability.
This is not an ideal situation in today’s fast-paced, work-from-anywhere world that demands resiliency and zero downtime.
The Hierarchical Approach: A Better Way to Scale and Isolate
Today’s enterprises rely on software-defined networking and flexible paradigms that support business agility and resiliency. But we live in an imperfect world full of unpredictable events. Is the public cloud down? Do you have a switch failure? Spine switch failure? Or even worse, a whole cluster failure?
Now, imagine a fault-tolerant data center that automatically restores systems after a failure. This may sound like fiction to you but with the right architecture it can be your reality today.
A fault-tolerant data center architecture can survive and provide redundancy across your data center landscapes. In other words, it provides the ultimate in business resiliency, making sure applications are always on, regardless of failure.
The architecture is designed with a multi-level, hierarchical controller cluster that delivers scalability, meets the availability needs of each fault domain, and creates intent-driven policies. This architecture involves several key components:
1. A multi-site orchestrator that pushes high-level policy to the local data center controller—also referred to as a domain controller—and delivers the separation of fault domain and the scale businesses require for global governance with resiliency and federation of data center network.
2. A data center controller/domain controller that operates both on-premises and in the cloud and creates intent-based policies, optimized for local domain requirements.
3. Physical switches with leaf-spine topology for deterministic performance and built-in availability.
4. SmartNIC and Virtual Switches that extend network connectivity and segmentation to the servers, further delivering an intent-driven, high-performing architecture that is closer to the workload.