In 2014, we realized our world in network engineering was changing rapidly. Like many IT organizations, we saw a shift to the internet and cloud services in our network traffic. Over that year alone, we observed a 200% increase in peak internet and cloud traffic. That’s when we knew our network needed to change to accommodate the evolution of conducting business at Cisco.
Our network was designed and evolved with the bulk of our enterprise traffic originating from and destined for resources within Cisco’s “four walls.” The growing demand for traffic to cloud providers and SaaS services meant a quick pivot was necessary. As a result, our initial step was to build Cisco IT CloudPorts in strategic carrier-neutral facilities, which allowed us to quickly secure connections between the Cisco enterprise and the outside world.
Our CloudPort hubs provide high availability and flexibility to turn up new connectivity quickly, but our private backbone connecting the CloudPorts needed enhancement. With more and more business reliance on public cloud and SaaS workloads, the resiliency and performance required for CloudPort connections grew. We needed the capability to quickly respond to network issues and use our backbone to route traffic from one region to another.
Our CloudPorts are now interconnected with a global cloud-ready backbone that allows seamless routing of connectivity in case of outages or performance issues within a region. This new backbone is built on top-of-line routers like the Cisco ASR9k and Cisco NCS5k series that are optimized for internet route table scale and better programmability of Border Gateway Protocol (BGP) policies.
That’s where we are today. But how did we get here?
The genesis: The birth of a strategy
During a team off-site — sitting in a small restaurant near our offices in North Carolina — our group discussed how our core network design was not sustainable. We needed a big change. While the approaches and technologies weren’t anything new, the scale and breadth of such a change was significant. How would we rearchitect and deploy a change of this scale, especially given the size and complexity of Cisco’s business? And a common question — how would we deploy with minimal disruption?
Over dessert, we began to sketch and brainstorm and sketch some more — literally on the back of a cocktail napkin. (We’ve all done that, right?) We concluded that we needed to stop thinking of the traditional enterprise where communication patterns were between users and our private cloud. Instead, we needed to design a new architecture that would deliver optimized and resilient connectivity between on-premise resources and users, and the outside world.
Our path to a solution began.
The value in the process
Over the next three years, our team looked at how to address the issue. We started by examining the existing infrastructure and making small, incremental changes to address immediate performance concerns. While we saw minor improvements, they were isolated, not necessarily repeatable at scale, and would create new complexity down the road. But this exercise provided great insight into the issues and confirmed our belief that we needed a new uniform network architecture.
Team member Oliver Agpalasin shares how our journey began: “We started with a blank canvas and set out to define the ‘future state’ of the network, putting traditional and historical thinking aside. With that architecture defined, we could then start thinking about execution and how to move to the new environment. All while recognizing the value this would bring to Cisco and the quality of experience it could provide to our internal clients.”
Changing the mindset
As an operations organization typically focused on solving day-to-day business issues, we were challenged by our inherent silos and conflicts of interest. To solve this, we adopted an agile mindset and made operations engineers the leaders of the program, freeing them from the constraints of “just keeping the lights on.” We wanted to leverage the team’s deep working knowledge of the network, break down the barriers between design and operations, and gather everyone in the same room in a series of workshops.
“We never questioned the value in the vision,” says team member Alisha Sanchez. “Adopting an agile mindset gave us the opportunity to carve that path independently and allowed us to focus on creating options, test those theories, and make informed decisions based on our findings.”
Insight from one team manager summarizes the real value of this mindset shift. As Steve Sheldon describes it, “Part of the agile methodology for me is that you’re able to make your own work. You are able to decide the best strategy as a team. In an operational role, you don’t always get that option. That’s a big mindset shift.”
Team member Prashant Bhadoria adds: “Along the way, and despite the new challenges that came up, we were always focused on choosing the best options. Having an overarching strategy in place helped us address each issue focused on that overall intent. We’re typically perfectionists, not risk-takers. But with the support of our leadership, we were encouraged to take bold steps.”
Deploying the new network
We understood our most significant challenges to be twofold:
1. Building consensus among our stakeholders
2. Deployment of the new network itself
With a project at this scale, we recognized significant obstacles which at first appeared insurmountable. But through the program team’s persistence and commitment, we solved them one by one.
“It was not easy to explain the business value to secure the funding and resources,” says Warren Rigney, a team manager. “But our leadership understood the risks and potential impacts of doing nothing.”
Part of the task facing us was to unravel two decades of complexity that could hinder delivery of the new architecture. Through self-written automation and auditing tools, we could visualize and continuously track all required clean-up efforts. As we peeled back the onion, we grew more confident in our ability to succeed.
In the words of team member Touseef Ahmed Gulgundi, “To speed up deployments and avoid risks, we utilized Network Services Orchestrator (NSO) automation to deploy the new backbone and policies. This approach allows us to reduce the deployment time from 12 hours for the first deployment down to less than four hours for the second — an efficiency trend that continued over time.”
Because we were deploying significant changes to Cisco’s core network, a prudent approach was needed, even if it meant small delays to project timelines. We built development and test environments where we could safely validate our changes before deploying at scale in production.
Building and deploying the new backbone was one thing. We also had to make sure that our support teams would understand the new environment. The test and development networks allowed team members to spin up their own virtual instances so they could freely play with the new setup. We invited these teams to shadow us during the implementations and turned over the keys to those confident enough to learn the new setup during deployment. In addition, we did extensive Transfer of Information sessions to make sure everyone in the wider team could support the new solution.
IT is all about the people
In the last one and a half years, the newly formed program team put their shoulders to the wheel, and things really started to happen. The team consisted of a mix of engineers, some wanting to move very aggressively while others preferring a more prudent approach. This mix triggered good conversations (and occasional differences) that ultimately resulted in the right decisions being made. The team also worked tirelessly across time zones, through late evenings, and in meetings while supporting their families during the global pandemic. Behind every corner there was often a new surprise, but the team never gave up and tackled each problem as it came.
The future of our network
This new backbone design laid the groundwork for the future and allowed us to be more agile and deliver new technical capabilities quickly while supporting our business transition and adoption of the cloud. Most importantly, the lessons learned during this program will benefit us as we keep driving innovation into Cisco’s corporate network.
What’s next? Our team is focused on expanding the network’s capabilities, including automating the resiliency of our internet network, extending the resiliency in our connectivity to cloud services, and bolstering our disaster recovery for internet services at scale.
Oh, and what became of the napkin? We still have it. We break it out every time a member of our team says, “It can’t be done!”