When I started this blog series, I had intended to present a cohesive approach to automating your network operations, eventually leading to DevOps. The intent was to take you beyond simply automating ad-hoc tasks using contrived examples to a more systematic way of automating your network operations (hence the title). Unfortunately, I completely failed because I presented it in an ad-hoc way. So in this blog, I am going to go back to the beginning: What are we trying to achieve?
Before I do that, I want to introduce a new blogger, Jason King. Jason and I have a very similar background in operations and development. We’ve both spent a large part of our careers on the front lines designing, building, and operating large systems and networks and hope to leverage that experience in our blogs, repos, and other works. With his help, hopefully we can get the series moving more smoothly.
Do we just want a human to be able to perform a single operation faster? If so, then we might not actually achieve that. Why is this? Automation is geared at deploying single or multiple changes across a large number of devices. However, many changes do not fall into this category.
There are basically two types of changes that are made to a network in steady state operations:
◈ Architectural/Engineering: These are changes to the architecture of the network (e.g. Routing, QoS, Multicast) that generally affect the entire network. It is also the architecture for how new services are deployed (e.g. tenants, remote sites, etc.)
◈ Create, Read, Update, Delete (CRUD): These are changes that deal with delivery of network services to a particular customer or application (e.g. putting a port in a VLAN, adding an ACE to an ACL, or adding a load balancing rule).
The rigor of DevOps is absolutely the way to make major architectural changes to a network because of the network-wide effect that these changes have and the relatively small number of changes that occur.
CRUD is often different, however. While making a single change (e.g. SNMP Community Strings) on thousands of devices is an operation for which the overhead of DevOps is justified, that same overhead may significantly slow down operations involving a single change on a single device (e.g. changing a port VLAN).
The DevOps overhead is not necessarily a bad thing, even for small changes. There are significant advantages in enforcing configuration management, revision control, code review, and testing on every change. It does not, however, always make network operations faster or easier. This increased friction can make it unpalatable to many network teams and hinder adoption.
There is also the Theory of Constraints with states that “any improvements made anywhere besides the bottleneck are an illusion.” (Eliyahu M. Goldratt, 2014)
We are going to let a customer request a complex service with potentially harmful repercussions without having a human in the loop??? Well … yes. But even if you do need a human in the loop, you don’t need an entire team of them in the loop. And in either case, that is why DevOps is important. DevOps is not just automation, it is the development, testing, deployment, and validation of the artifacts that provide the service. If you properly develop and test these artifacts, you can be reasonably assured that the process does not go pear shaped. If you do the proper validation of deployed services, you can “fail fast” back to a previous version of the artifacts.
DevOps is the safety harness for automation, but that does not mean that you cannot start automating without it. In fact, very few organizations are able to implement a righteous DevOps pipeline in one go. The DevOps journey should be viewed as a stepwise approach to delivering business value. In this blog, we have provided some criteria for how to determine what to automate in order to achieve maximum value. You should not automate just to automate. Start with automation that addresses your critical business needs that fall into the categories outlined above, but always keep an eye on the end goal of building toward DevOps processes and, ultimately, business transformation.
Before I do that, I want to introduce a new blogger, Jason King. Jason and I have a very similar background in operations and development. We’ve both spent a large part of our careers on the front lines designing, building, and operating large systems and networks and hope to leverage that experience in our blogs, repos, and other works. With his help, hopefully we can get the series moving more smoothly.
Automated Humans vs. Automated Business
Do we just want a human to be able to perform a single operation faster? If so, then we might not actually achieve that. Why is this? Automation is geared at deploying single or multiple changes across a large number of devices. However, many changes do not fall into this category.
Engineering vs. CRUD
There are basically two types of changes that are made to a network in steady state operations:
◈ Architectural/Engineering: These are changes to the architecture of the network (e.g. Routing, QoS, Multicast) that generally affect the entire network. It is also the architecture for how new services are deployed (e.g. tenants, remote sites, etc.)
◈ Create, Read, Update, Delete (CRUD): These are changes that deal with delivery of network services to a particular customer or application (e.g. putting a port in a VLAN, adding an ACE to an ACL, or adding a load balancing rule).
The rigor of DevOps is absolutely the way to make major architectural changes to a network because of the network-wide effect that these changes have and the relatively small number of changes that occur.
CRUD is often different, however. While making a single change (e.g. SNMP Community Strings) on thousands of devices is an operation for which the overhead of DevOps is justified, that same overhead may significantly slow down operations involving a single change on a single device (e.g. changing a port VLAN).
The DevOps overhead is not necessarily a bad thing, even for small changes. There are significant advantages in enforcing configuration management, revision control, code review, and testing on every change. It does not, however, always make network operations faster or easier. This increased friction can make it unpalatable to many network teams and hinder adoption.
Constraints-based IT
There is also the Theory of Constraints with states that “any improvements made anywhere besides the bottleneck are an illusion.” (Eliyahu M. Goldratt, 2014)
This means that if you are automating processes that are not slowing down your business, you are not having an effect on your business. The entire effort is potentially a waste of time and money. That is why automation should begin with a thoughtful process that identifies the most important outputs of your infrastructure and what the current bottlenecks are in producing those outputs.
In general, automation is going to provide your business two main improvements:
When your enterprise produces customer value faster (e.g. onboarding new customer and/or offering new services), the business generally brings in more revenue. Adding a faster time to remediation (e.g. less maintenance and quicker trouble resolution) reduces operating costs and increases customer satisfaction. When done right, it is a powerful combination proves the best outcomes for any automation project.
This is why we must focus on automating business processes and not just humans. In fact, the best way to automate a business is to remove the human from the process, at least from that value chain between the customer’s request and the delivery of that request.
“If it does not have an API, it does not exist”
– Mitchell Hashimoto
Automating Business Processes: API-Driven Automation
When we consider how to automate business processes, we must focus on reducing the time between the time a customer requests a new service and the time they receive that service. Humans, generally, are not the best way to reduce this time, which is where APIs come in. APIs allow each step of the process to be automated.
For example, when a user wants to add a firewall exception for a new server, they can go to a self service portal to make that request. That request can then go through the review process to make sure that it is aligned with business policies (hopefully automatically) and appropriately approved. Once it is approved, the ITSM pokes the automation framework through an API to begin the delivery of the service. The advantages of this approach is that:
1. It takes the network team out of the CRUD
2. It allows the network team to define and put checks around how changes to the network are performed
Hold the phone!
We are going to let a customer request a complex service with potentially harmful repercussions without having a human in the loop??? Well … yes. But even if you do need a human in the loop, you don’t need an entire team of them in the loop. And in either case, that is why DevOps is important. DevOps is not just automation, it is the development, testing, deployment, and validation of the artifacts that provide the service. If you properly develop and test these artifacts, you can be reasonably assured that the process does not go pear shaped. If you do the proper validation of deployed services, you can “fail fast” back to a previous version of the artifacts.
DevOps is the safety harness for automation, but that does not mean that you cannot start automating without it. In fact, very few organizations are able to implement a righteous DevOps pipeline in one go. The DevOps journey should be viewed as a stepwise approach to delivering business value. In this blog, we have provided some criteria for how to determine what to automate in order to achieve maximum value. You should not automate just to automate. Start with automation that addresses your critical business needs that fall into the categories outlined above, but always keep an eye on the end goal of building toward DevOps processes and, ultimately, business transformation.
What’s next?
Well, instead of writing blogs over the past 6 months, we’ve been working with a team of people to create a CI/CD pipeline for SD-WAN. Not a set of examples of what you could do, but a fully functional, operationally righteous framework that we and our customers use in their operations. To get there, we wrote Ansible modules for Viptela, VIRL, NFVIS, and PyATS that automate every step of the process. We’ll cover this in our next blog, followed by an in-depth treatment of each component and how to consume it individually on your stepwise path to DevOps.