In the not-too-distant past, everything in the application and networking stack was under IT’s control. Workloads lived securely in the on-premise data center—people sat in their campus offices connected to the secure wireless network, and an MPLS service with an SLA connected branch offices to the data center.
Today, workforce productivity depends on cloud and SaaS applications that often rely on the public cloud infrastructure, which in turn depends on the internet as part or all of the WAN connectivity which in turn depends on a multitude of ISPs, CDNs and advanced network services. Hybrid and native clouds applications are mostly containerized, so performance can be affected by the communication paths among the microservices, both in the data center and cloud. The total application experience as perceived by the workforce is dependent on the performance of all the components of applications and network connections acting in concert. If one element falters, the whole experience can be impacted.
NetOps and DevOps need to understand the interdependencies among the component applications and tune the enterprise network and internet paths accordingly. A unifying view can only be provided by the network fabric that monitors and analyzes the full stack of interlacing components: from the foundational network data layer to the software-defined WAN to application containers in the cloud. With the workforce accessing applications from literally everywhere, all the time, IT requires pervasive, real-time monitoring of network, internet, and application performance with auto-healing capabilities. This is Full-Stack Observability, driven by software-defined controllers and network analytics that enable action, policy, and automation.
Observability Begins with a Deep Historical View
To improve application experience, IT needs tools to record, analyze, and report on network and application activity at a massive scale to build a deep historical data set against which to apply AI and Machine Reasoning tools. Hybrid and cloud applications consist of multiple micro-components connected by east-west traffic in the data center or cloud service. Continuous monitoring and analysis are needed to optimize application experience because many inter-application communication issues are transitory and difficult to replicate. Application performance needs to be recorded for machine analysis to determine recurring issues and root causes. Full Stack Observability from the perspective of the application requires:
◉ Application end-user experience as measured by ThousandEyes, NetFlow, or AppDynamics;
◉ Dependency graph to the underlying composite application services and infrastructures;
◉ Comprehensive availability and performance data on each of the supporting components such as composite application services, public cloud services, ISPs, networking devices, compute and storage infrastructure.
The irony of having mountains of telemetry and activity logs awaiting analysis by overworked IT teams is that there is too much noise in too much data for humans to deal with in a timely manner. When the volume of data is beyond human scale and below human sensitvity, machine reasoning (MR) automates the analysis of trillions of bytes of switch and router telemetry, wireless radio fingerprints, and network access point interferences to uncover patterns in the chaos, and turn the findings into actionable insights and automated mitigation actions.
Automated Observability with AI Nework Analytics
To make full use of the deep historical and real-time data, IT can take advantage of an Analytics Stack that can:
◉ Use purpose-built applications to augment human engineers in NetSecOps with Insights into network performance and security vulnerabilities
◉ Leverage machine-speed analytics and knowledgebase-driven Machine Reasoning Engine (MRE) to unburden NetSecOps from mundane monitoring tasks to focus on proactive digital transformation projects with DevOps.
◉ Achieve massive collection, storage, and parsing of diverse data lakes—collections of anonymized network and application telemetry based on volume, velocity, and variety of data to compare performance and security metrics.
For several decades, Cisco has been building a data lake of worldwide, anonymized customer telemetry in parallel with a knowledgebase of expert troubleshooting experience, both of which are available to machine reasoning algorithms under the command and control of Cisco DNA Center. With Cisco AI Network Analytics, NetOps can, for example, be forewarned of increases in Wi-Fi interference, network bottlenecks, uneven device onboarding times, and office traffic loads in the more traditional data center and campus network environments.
Observability for cloud-based applications, however, needs a different approach as much of the application infrastructure is not under direct control of IT. Direct internet connections to clouds can be unreliable—especially for latency-sensitive applications—unless they are monitored and automatically tuned using cloud onramps.
Observability into Cisco Cloud OnRamps for each of the major cloud services—Microsoft Azure, Amazon AWS, and Google Cloud, as well as colocation, and SaaS platforms—provides the ability to monitor and set performance parameters that are automatically applied to maintain the proper quality of service based on the type of application and cloud provider. Paths are calculated by tracking characteristics including packet loss, latency, and jitter in the data plane tunnels among cloud workloads and edge devices. Cisco AppDynamics and ThousandEyes provide application layer observability for inter-cloud and intra-cloud dynamics that enables NetOps and DevOps to monitor and identify factors affecting application experience.
Network Analytics + Software-Driven Controllers = Full Stack Observability
AI Network Analytics working in conjunction with Software-Driven Controllers enables Full Stack Observability. Operational intents and security policies defined in software-driven controllers are compared with telemetry and operational anomalies detected by an MRE to automatically adjust operations or isolate rogue devices. Always-on AI Analytics watch over the distributed workforce and workloads at machine-speed, making automatic adjustments or sending alerts with suggested remediations to appropriate levels of IT personnel or to SIEM applications to log and kickoff trouble tickets. Over time, NetOps and DevOps can fine-tune application performance using a consistent flow of insights from analytics to adapt to changes in workloads, workforce, and workplace.
The next shift is using AI and MRE to “personalized” recommendations on updates and patches for controllers. Upgrading controllers carries a certain risk given the complexity and many differences among existing network configurations. Knowing in advance what affect an update can have—and even if it applies to the existing configuration—can bring peace of mind to the process. Does a specific configuration warrant a patch if that issue is not relevant? If not, then there is no reason to force an update that is not required. Are controllers running an OS version with active PSIRT vulnerabilities? NetOps is alerted to put a higher priority on upgrading those specific controllers. Automation and observability go hand in hand to make operation teams more efficient so they can spend time on more valuable tasks.
Observability Provides Operational Simplicity and Serviceability
Full stack observability is the foundation of a new network and security operating model that ensures application experience and trust. The ultimate outcome of attaining full-stack observability is to make all the operations teams—NetOps, SecOps, DevOps and CloudOps—able to work together to raise the levels of serviceability across the application infrastructure. Automations that support full-stack observability simplify operations as well by eliminating many of the time-consuming and tedious tasks of network monitoring and troubleshooting.
Source: cisco.com