With unprecedented increases in network complexity and scale, AI is no longer just “a nice to have” – it is becoming essential to helping NetOps teams maintain service and network assurance. Network strategists already know this: More than 50% identify AI as a priority investment needed to deliver their ideal network.
AI: What can’t it do?
However, there are also a lot of over-blown expectations. As the engineering lead on AI for networking at Cisco, I often find myself in conversations about very futuristic, and somewhat unrealistic AI-enabled scenarios. It can be quite entertaining – but we also need to remember that today’s AI technology is not a panacea for every networking ailment.
For now, and for the next few years, AI will only help fully automate a limited set of straightforward use cases. In most cases, that require more complex and flexible analysis, AI will simply help human operators make quantifiably better and faster decisions.
AI: What can it do?
So, what can AI help us do today? One of the most common AI techniques, machine learning (ML) offers unique capabilities that operators can use to assure required network performance.
ML algorithms are certainly very powerful, but they also have a reputation of being difficult to design, tune, and adapt to a variety of situations and sometimes have been known to produce results that may be difficult to interpret.
With Cisco AI Network Analytics, we have created a learning platform that solves issues where ML provides an indisputable and impactful benefit for network operators over existing technologies and approaches. This is possible thanks to the combination of two factors: (1) decades of experience in building some the world’s largest and most advanced networks and (2) deep expertise in ML algorithms that can effectively process networking data.
AI and ML have some useful applications
Let’s look at one of the more useful ML use cases – complex event processing. When applying ML to network telemetry, it is possible to establish dynamic baselines of what constitutes normal operating conditions for a given intent.
For example, the ML model(s) may be used to predict what should be the lower-upper bounds for a given KPI, for example, Wi-Fi on-boarding times. On-boarding refers to the set of complex tasks triggered when a wireless client attempts to join a wireless network. Joining a network successfully and seamlessly contributes significantly to the Quality of Experience for the end user. Being able to monitor such complex, multidimensional KPIs so as to detect abnormal onboarding times, along with determining potential root causes should an issue occur, is a fundamental task for IT teams.
In this instance, Machine Learning (ML) allows for computing models used to predict the upper and lower bounds of the KPIs for on-boarding. KPIs falling outside a prediction as provided by the ML model would be considered “abnormal” for that unique network involved, and thus would be candidates for raising an alarm (that is, an alarm based on a learned bound, not based on a static value).
The figure below shows a predicted “band” (shown in green) of normal values for the percentage of failed onboarding sessions. As we can see, at some point the percentage of failed onboarding sessions (blue line) became abnormal (falling outside the green band), considering a number of network variables involved, as analyzed by the ML algorithm in use. This departure from normal to abnormal behavior for this network is denoted by the red section of the time-line in the diagram shown.
Predicted range of normal values for the percentage of failed onboarding sessions
A second ML use case that has a lot of potential is correlated insights. ML can provide deeper insights and visibility into the operation of the network and even help predict when an anomalous condition is likely to occur in the future.
A third important use case would be root-causing. In some cases, an ML algorithm may be able to detect anomalies with associated root causing, whereas in other situations more than one ML algorithm may be used in conjunction with anomaly detection to provide root causing.
IBN and AI as disrupters
AI and advanced networking technologies like IBN are disrupting how things are done, especially for networking operations. Testing of new applications can be done in minutes instead of weeks. Troubleshooting gets significantly easier when an assurance engine identifies root causes and recommends fixes. In fact, when armed with powerful dashboards that offer actionable insights, a future network operator may only need to look in a handful of places, as opposed to plowing through heaps of possible causes.
The intent-based networking (IBN) vision is that network teams will simply define the required behavior, and the network will know how to continuously align itself with what the business needs.