Sunday 27 September 2020

Introduction to Programmability – Part 1

Are you a network engineer and have had to repeat the same boring task at work, every day? Do you feel that there must be a way for you to do a task once, and then “automate” it? Theoretically, an infinite number of times? Or, have you been spending more time cleaning up and correcting configuration mistakes than you spend implementing those configurations? Or maybe you have been hearing a lot about this hot new “thing” called network programmability, but in the middle of the hype, could not figure out what exactly it is?

If any of those cases (and many others) apply to you, then you are in the right place. The fact that you are here, reading this now, means you know that there is probably a solution to your problem(s) in the realm of automation and/or programmability. In this case, buckle up because you are in for a ride!

If you are a network engineer and browsed to this page by mistake, I still urge you to read on. Netflix, Youtube, Facebook and Twitter will still be there when you are done. (Or not.) This is more fun. Trust me!

A Few Definitions For The Road

Before we dive into the nuances of network programmability and automation, let’s clear up some confusion. I hate nothing in the world more than definitions – well, maybe greasy pizza – but this is a necessary evil! In order to start clean, you must understand each of the following: network management, automation, orchestration, modeling, programmability and APIs.

Network Management is an umbrella term that covers the processes, tools, technologies, and job roles, among other things, required to manage a network and the lifecycle of the services offered by that network.

Many standards and frameworks exist today to define the different components of network management. One of them is FCAPS, where the acronym stands for Fault, Configuration, Accounting, Performance and Security Management. FCAPS is geared towards managing the systems that constitute the network.

Another is ITIL. The acronym stands for Information Technology Infrastructure Library and covers an extensive number of practices for IT Services Management (ITSM), which is basically the lifecycle of the services provided by the network. ITIL is divided into 5 major practices: Service Strategy, Service Design, Service Transition, Service Operation, and Continual Service Improvement. Each practice is divided into smaller sub-practices. For example, Service Design includes Capacity Management, Availability Management and Service Catalogue Management while Service Operation includes Incident Management, Problem Management and Request Fulfilment. Some people make a career being ITIL practitioners.

The Merriam-Webster dictionary defines Automation as “the technique of making an apparatus, a process, or a system operate automatically”. In other words, having a system of some sort do the work for you, work that you would otherwise do manually. However, you will have to tell this system what is it that you want to get done, and sometimes, how to do it.

So, configuring a network of routers with dynamic routing protocols, so that these routers speak with each other and figure out the shortest path per destination, is a form of automation. The alternative would be having someone do the calculations manually on a piece of paper, and then configuring static routes on each router. And so is writing a program that configures a VLAN on your switch – or your 500 switches – without someone having to log in to each switch individually and configuring the VLAN via the CLI.

As you have already guessed, the power of automation is not intrinsically in the automation itself. Logging into one switch manually and configuring one VLAN is probably much faster than writing a Python program to do that for you. So why automate? Obviously, the importance of automation is its application to repetitive tasks.

Automation will not only save the time you will spend repeating a task, it also maintains consistency and accuracy of performing that task, over all its iterations. It does not matter if you have 10 or 500 switches. The program you wrote will always go through the exact same steps for eachevery switch, with the exact same result. Every time. Of course, the assumption here is that no errors will happen because of factors external to your program, such as an unreachable switch, wrong credentials configured on a switch, or a switch with a corrupted IOS. Although, you can write a program to detect and mitigate these error conditions!

When you have several systems working together to get a job done, there is typically a need for a system, or a function, to coordinate the execution of the tasks performed by the different systems towards getting this job completed. This coordination function is called Orchestration.

For example, a private or public cloud that provides virtual machines to its users will include different systems to provision the network, compute, virtualization, operating systems, and maybe the applications, for those VMs. Orchestration will provide the function of coordination between all the different systems and applications to get the VM up and running.

Automation and orchestration work well in tandem. Automation covers single tasks. Using software to configure a VLAN on a switch is automation, and so is provisioning a VM over ESXi, or installing Linux on that VM. Orchestration, on the other hand, is the function of coordinating the execution of these automated tasks, in a specific sequence, each task using its own software and each on its respective system. The scope of automation involves single tasks. The scope of orchestration involves a workflow of tasks.

The concept of Modeling Data is not a new concept and is not exclusive to networks or even automation. Data modeling is very involved and is a major branch of data science. For the humble purpose of this blog, let’s use an example to demonstrate what a model is. In Example 1, you can see a configuration snippet of BGP on an IOS-XR router in the left column. In the right column, the specific values for this particular device were removed and replaced with a description of what should be there. A template of sorts.

Cisco Prep, Cisco Learning, Cisco Guides, Cisco Exam Prep, Cisco Learning, Cisco Tutorial and Material
Example 1 – BGP configuration snippet on IOS-XR and the corresponding data model in tree notation

As you can see, a model is a little more than just a template.

A model describes data types. An IP address is composed of four octets separated by the “.” character and each octet has a value between 0 and 255. An ASN is an integer between 1 and 65535. These two objects, an IP address and ASN, are leaf objects, each having a specific type and each instance of that leaf has a value.

As you have already guessed, a model also describes the data hierarchy. Address families are child objects to the main BGP process. Then the networks injected for a particular address family are children objects to that address family. And the same applies to neighbors: there are global neighbors that are children to the BGP process, and then there are neighbors defined under the different VRFs. You get the point.

So, in order to reflect hierarchy, other object types may exist in a model besides a leaf, such as leaf lists, lists and containers. A leaf-list is a list of leafs. For example, a snmp server configured on router is a leaf object. A list of snmp servers make up a leaf list. All leafs under a leaf list are of the same type.

A list is a group of other objects and has many instances. For example, the VRF in Example 1 is a list. It has children objects of different types (some of which are themselves lists), and at the same time you may have more than one VRF configured under the BGP process. A container is a group of objects of different types, but a container will only have a single instance. An example of a container is the BGP process itself. This is an over simplification of what a data model is just to elaborate on the concept.

Data models used in the arena of network programmability are described using a language called YANG. A “YANG model” is nothing more than a data model described using YANG.

Defining Programmability is not as easy as the previous terms. The reason for this is that the term is used across a very wide spectrum, and means different things depending on what context it is used in. Programming a device or a system basically means giving it instructions to do what you want it to do. A programmable device is a device that can execute different tasks based on the instructions it is given. In the world of electronics, an ASIC (Application Specific Integrated Circuit) is a chip that does one specific function. If this ASIC is built to accept two numbers as input and add those two numbers, it will always do that. A Microprocessor, on the other hand, accepts instructions describing what you want it to do with the input it is given. You can program it to add two numbers, multiply them, or subtract one from the other. A microprocessor is a programmable device while an ASIC is not.

But doesn’t this equate programming to configuration? Configuring a network device is basically telling that device what to do … right? Well, that is tricky question, and it is here that we discuss programmability as used today in the context of network automation.

Programmability for the purpose of network automation is basically the capability to retrieve data, whether configuration or operational data, from a system, or push configuration to a system, using an Application Programming Interface, or API for short. An API is an interface to a system that is designed for software interaction with this system. Contrast this to a router CLI. A CLI requires human interaction to be useful. An API on that same router would be designed so that a Python program, for example, can interact with the router without any human intervention.

But what really is an API?

An API is software running on a system. This software provides a particular function to other software, while not exposing this other software to how this function is performed. An API will typically have a predefined way to reach it, for example, over a specific TCP port. The API will also define the format of the data that it accepts, as well as the data it sends back. It may also define different message types and specific syntax and semantics to avail the services provided by the API. A device that implements an API is said to expose an API to be consumed by other software.
Now to connect the dots. Orchestration coordinates a number of automated tasks in a workflow to implement one of the disciplines of network management. Automation may leverage an API exposed by a device in order to manage this device programmatically. When a data model is used as a reference during programmatic access, the device is said to leverage model-driven programmability.

Quite a mouthful!

Network Programmability: The Details


Network Programmability as a practice is best summarized by the Programmability Stack in Figure 1 below. The stack defines six layers:

1. Application
2. Model
3. Protocol
4. Encoding
5. Transport
6. Infrastructure

Cisco Prep, Cisco Learning, Cisco Guides, Cisco Exam Prep, Cisco Learning, Cisco Tutorial and Material
Figure 1 – The Network Programmability Stack

(Disclaimer: The Programmability Stack has not been standardized by any industry-recognized entity such as the OSI 7-Layer Model. Therefore, you will probably run into a number of variations of this stack as you progress in your studies of programmability. I found that the one I have drawn here is the best version for the sake of getting the point across. Feel free to contrast it to other version you find elsewhere and tell me what you think in the comments below.)

At the top of the stack is an application that may be a simple python script or a sophisticated network management system such as Cisco Prime. At the bottom of the stack is the device exposing an API. In order for the application to programmatically speak with the device’s API, it will leverage a choice of components at the different layers of the stack.

The application will choose a model. Different types of models exist, the majority today described in YANG. A model may be vendor-specific or standards-based. In either case, the model will define the structure of the data that the application sends to or receives from the device (through the API).

The application will have to choose a protocol that defines the message types as well as the syntax and semantics used in those messages. There are three primary protocols used today for network programmability: NETCONF, RESTCONF and gRPC.

NETCONF, for example, defines three message types: hello, rpc and rpc-reply. It also defines specific operations that may be used in the rpc message to perform tasks such retrieving operational data from a device or pushing configuration to a device. The messages will use specific, well-defined syntax.

The protocols themselves are sometimes described as the APIs. Don’t get confused just yet ! When a device exposes a NETCONF API, then the application will have no choice but to use the NETCONF protocol to speak with the device. The same applies to the other protocols.

A protocol will have a choice of a data format, typically referred to as the data encoding. The most common data encodings in use today are XML, JSON, YAML and GPB. For example, NETCONF will send and receive data only in the form of XML documents. RESTCONF supports both, XML and JSON.

Then this data will be transported back and forth between the application and the device that is exposing the API using a transport protocol. For example, NETCONF uses SSH while RESTCONF uses HTTP.

This is network programmability in a nutshell!

Saturday 26 September 2020

Automated response with Cisco Stealthwatch

Cisco Stealthwatch provides enterprise-wide visibility by collecting telemetry from all corners of your environment and applying best in class security analytics by leveraging multiple engines including behavioral modeling and machine learning to pinpoint anomalies and detect threats in real-time. Once threats are detected, events and alarms are generated and displayed within the user interface. The system also provides the ability to automatically respond to, or share alarms by using the Response Manager. In release 7.3 of the solution, the Response Management module has been modernized and is now available from the web-based user interface to facilitate data-sharing with third party event gathering and ticketing systems. Additional enhancements include a range of customizable action and rule configurations that offer numerous new ways to share and respond to alarms to improve operational efficiencies by accelerating incident investigation efforts. In this post, I’ll provide an overview of new enhancements to this capability.

Benefits: 

◉ The new modernized Response Management module facilitates data-sharing with third party event gathering and ticketing systems through a range of action options.

◉ Save time and reduce noise by specifying which alarms are shared with SecureX threat response.

◉ Automate responses with pre-built workflows through SecureX orchestration capabilities.

Cisco Prep, Cisco Learning, Cisco Tutorial and Material, Cisco Certification
The Response Management module allows you to configure how Stealthwatch responds to alarms. The Response Manager uses two main functions:

◉ Rules: A set of one or multiple nested condition types that define when one or multiple response actions should be triggered.

◉ Actions: Response actions that are associated with specific rules and are used to perform specific types of actions when triggered.

Cisco Prep, Cisco Learning, Cisco Tutorial and Material, Cisco Certification
Response Management module Rule types consist of the six alarms depicted above.

Alarms generally fall into two categories:


Threat response-related alarms:

◉ Host: Alarms associated with core and custom detections for hosts or host groups such as C&C alarms, data hoarding alarms, port scan alarms, data exfiltration alarms, etc.

◉ Host Group Relationship: Alarms associated with relationship policies or network map-related policies such as, high traffic, SYN flood, round rip time, and more.

Stealthwatch appliance management-related alarms:

◉ Flow Collector System: Alarms associated with the Flow Collector component of the solution such as database alarms, raid alarms, management channel alarms, etc.

◉ Stealthwatch Management Console (SMC) System: Alarms associated with the SMC component of the solution such as Raid alarms, Cisco Identity Services Engine (ISE) connection and license status alarms.

◉ Exporter or Interface: Alarms associated with exporters and their interfaces such as interface utilization alarms, Flow Sensor alarms, flow data exporter alarms, and longest duration alarms.

◉ UDP Director: Alarms associated with the UDP Collector component of the solution such as Raid alarms, management channel alarms, high availability Alarms, etc.

Cisco Prep, Cisco Learning, Cisco Tutorial and Material, Cisco Certification
Choose from the above Response Management module Action options.
 
Available types of response actions consist of the following:

◉ Syslog Message: Allows you to configure your own customized formats based off of alarm variables such as alarm type, source, destination, category, and more for Syslog messages to be sent to third party solutions such as SIEMs and management systems.

◉ Email: Sends email messages with configurable formats including alarm variables such as alarm type, source, destination, category, and more.

◉ SNMP Trap: Sends SNMP Traps messages with configurable formats including alarm variables such as alarm type, source, destination, category, etc.
ISE ANC Policy: Triggers Adaptive Network Control (ANC) policy changes to modify or limit an endpoint’s level of access to the network when Stealthwatch is integrated with ISE.

◉ Webhook: Uses webhooks exposed by other solutions which could vary from an API call to a web triggered script to enhance data sharing with third-party tools.

◉ Threat Response Incident: Sends Stealthwatch alarms to SecureX threat response with the ability to specify incident confidence levels and host information.

The combination of rules and actions gives numerous possibilities on how to share or respond to alarms generated from Cisco Stealthwatch. Below is an example of a usage combination that triggers a response for employees connected locally or remotely in case their devices triggers a remote access breach alarm or a botnet infected host alarm. The response actions include isolating the device via ISE, sharing the incident to SecureX threat response and opening up a ticket with webhooks.

Cisco Prep, Cisco Learning, Cisco Tutorial and Material, Cisco Certification
1) Set up rules to trigger when an alarm fires, and 2) Configure specific actions or responses that will take place once the above rule is triggered.

The ongoing growth of critical security and network operations continues to increase the need to reduce complexity and automate response capabilities. Cisco Stealthwatch release 7.3.0’s modernized Response Management module helps to cut down on noise by eliminating repetitive tasks, accelerate incident investigations, and streamline remediation operations through its industry leading high fidelity and easy to configure automated response rules and actions.

Friday 25 September 2020

New Technology for Cable Operators to Consider

Cisco Prep, Cisco Tutorial and Material, Cisco Learning, Cisco Exam Prep, Cisco Guides

In the last several years, the role of compute resources has increased the demands upon modern cable regional and access networks. Computation has quickly become part of network infrastructure itself, beyond just supporting services, over-the-top applications, and management tasks. At the same time, advancements in silicon and optical technology allow for a re-examination of cable network topology and service placement. This blog examines some key decision points the cable industry needs to consider as we work together to build the next generation of a Modern Cable Network.

The Growing Role of Compute

Computing has always played an important role in Internet systems. Network services such as DNS and SMTP, as well as applications such as web services, video cache, and the control planes of routers themselves, all depend on general-purpose compute systems being distributed in the network. Some of these compute resources are discrete servers, some are in large cloud computing environments, and still others are co-resident in routing devices. But they all share the same fundamental trait – they keep and maintain application and/or network state, they run generally available operating systems, and today, all use common x86-based CPU’s.

Cisco Prep, Cisco Tutorial and Material, Cisco Learning, Cisco Exam Prep, Cisco Guides

As computational power has grown, the ability for compute resource to perform stateful transformation of data has highly useful applications. In other words, the ability for a resource to take input from an app or the network, transform that input in some way, and return it in a more useful state. Examples of this could be real-time face recognition, such as identifying individuals in video streams. Raw video is fed into a resource, software analyzes the raw video, and returns a structured set of data. Or real-time speech to text, such as that present on modern smartphones. Raw audio is fed into an application, software deciphers the language present, and returns ordered text to be fed into additional applications.

The key is that as computation is used for more real-time, stateful transformation of data, the ability to access those resources quickly and reliably becomes paramount. And this directly translates into the latency, or the amount of time on a wire, between the end-user and that, compute resource. Ultimately, we’re talking about the speed of light.

Cisco Prep, Cisco Tutorial and Material, Cisco Learning, Cisco Exam Prep, Cisco Guides

Low latent access to real-time computation is among the most lucrative, and untapped, resources present on cable networks. Network technology is advancing to make the placement of computation in cable networks much more advantageous to this new opportunity.

Advent of New Network Technology


While demand for low latent computing starts to grow, the cable industry faces some decision points to make. New network technology is permitting a massive disaggregation, and re-architecture, of cable access and metro networks.

Distributed Access Architecture (DAA) systems, such as Remote PHY, enable the pervasive use of IP and Ethernet transport in the access layer, where the previous legacy HFC analog transmission was used.

Virtualized CCAP, such as Cisco’s Cloud Native Broadband Router (cnBR), leverages Remote PHY technology to build a scale-out, software-oriented, microservices-based analogy to a contemporary CMTS. A key point of the cloud native software architecture of the cnBR is the use of the network to place all, or parts, of the system’s functionality to anywhere the network topology extends.

Next-generation silicon, optics, software. New routing platforms, such as the Cisco 8000 series, leverage next-generation forwarding ASIC technology to deliver unprecedented capacity and systems simplicity, all in a power and space-efficient package. Coupled with emerging Digital Coherent Optic (DCO) technology such as 400G-ZR and ZR+ pluggables, it is possible to build a cable metro topology that is much more interconnected, with traffic patterns that follow the value of a dollar and not strictly the path of a wavelength. What this means is, compute can be placed in arbitrary locations, to where packet latency to it is optimal for the application.

Cisco Prep, Cisco Tutorial and Material, Cisco Learning, Cisco Exam Prep, Cisco Guides

A key compute resource that needs consideration for placement is the cnNR or Virtualized CCAP itself. A centralized vCCAP gains efficiency in software economies of scale. But a distributed vCCAP permits the opportunity to offload routable traffic closer to subscribers, which means closer to an edge compute or low latent access to compute architecture. Careful thought needs to be applied when designing the cnBR or vCCAP as a portion of overall network design and goals.

DOCSIS 4.0 also plays a role in a Modern Cable Network.  To learn the latest with this standard, attend our webinar:  DOCSIS 4.0 Evolution in the Cable Plant, Are You Ready.  If you would like to chat more about architecting and designing the next generation of a Modern Cable Network, stop by our virtual exhibit at SCTE-IBSE Cable Tec Virtual Expo.

Thursday 24 September 2020

Detecting and Mitigating Loops in VXLAN Networks

The Problem with Looping

First-generation Layer-2 Ethernet networks could not natively detect or mitigate looped topologies, while modern Layer-2 Overlays implicitly build loop-free topologies. Overlays do not have any need for loop detection and mitigation as long as no first-gen Layer-2 network is attached, which is common in complex data center networks. When loops occur, data frames can exist indefinitely, disrupting network stability and degrading performance. Loops introduce broadcast radiation, increasing utilization of CPU and network bandwidth, which results in a degradation of user application access experience. In multi-site networks a loop can span multiple data centers, causing disruptions that are difficult to pinpoint. In other words, loops are bad news. Before we look at how a modern network fabric minimizes looping, let’s examine previous attempts at preventing loops in topologies.

Spanning Tree Protocols (STP) counteract the loop problem in first-gen Layer-2 Ethernet network. Over time, other approaches evolved by moving networks from “looped topologies” to “loop-free topologies”. This evolution reduced the dependence on Loop Prevention protocols, so they are now employed mostly as a failsafe mechanism. Today with Network Virtualization Overlays, the dependency on Loop Prevention protocols is almost entirely eliminated. However, even though virtualized overlay networks such as VXLAN EVPN are loop free, having a failsafe loop detection and mitigation method is still desirable because loops can be introduced by topologies connected to the overlay network.

Cisco Prep, Cisco Tutorial and Material, Cisco Learning, Cisco Networks, Cisco Guides

Loop-free VXLAN overlays may be connected to an Ethernet segment that can result in network loops, requiring detection and mitigation in conjunction with the overlay.

Many Solutions to Loop Prevention, But Which is the Best?


The Spanning Tree Protocol enables network designs that include redundant links to provide fault tolerance but avoid the presence of bridging loops. STP builds a single tree that calculates the relationship of network nodes and bridges within a layer 2 network to avoid creating loops.

An alternate approach to prevent loops in layer 2 networks uses link bundles between two neighboring bridges. This technique improves performance (Link Aggregation – LAG) and provides link redundancy (member link failure in a LAG). When multiple bridges exist, link bundles are extended to provide peering between multiple bridges (Multi-Chassis Link Aggregation – MLAG), increasing bridge node resiliency along with link redundancy and performance. In both of these cases, the link bundles are treated by STP as a single logical link and the creation of a loop is prevented (loop free). In each of these cases, STP acts as a failsafe.

While LAG and MLAG were in use for many years, other approaches for building loop free topologies arose by using ECMP (Equal Cost Multi-Path), either at the MAC layer or IP layer. FabricPath or TRILL (Transparent Interconnect of Lots of Links) are MAC layer ECMP approaches that emerged in the last decade. More recently, Network Virtualization Overlays that build loop free topologies on top of IP layer ECMP became the state-of-the-art. VXLAN is the most prevalent network virtualization protocol in use today that builds loop free topologies.

Cisco Prep, Cisco Tutorial and Material, Cisco Learning, Cisco Networks, Cisco Guides

A loop-free VXLAN overlay network.

While a VXLAN Overlay provides a loop free layer 2 service over IP ECMP, a layer 2 loop may still be introduced by connecting an L2 Ethernet network. VXLAN Edge-Devices act as bridges between VXLAN and Ethernet, known as Layer 2 Gateways (L2GW). A loop on the Ethernet network side can still introduce harmful broadcast radiation to the loop-free overlay network. If a loop is accidentally configured, physically or logically, the absence of a Loop Prevention protocol in VXLAN could allow the existence of a loop. While the layer 2 service in the VXLAN overlay network does not participate in the Spanning Tree Protocol, even if it could, blocking of a link in a loop-free overlay network would not prevent a loop but might cause additional harm, such as loss of service.

While proposals exist to integrate the overlay network with STP, these proposals are considering all Edge-Devices representing a single STP root bridge – Layer 2 Gateway STP (L2G-STP). While this approach is valid, it introduces rigidity into the deployment of modern overlay networks, reducing flexibility. With L2G-STP or similar approaches, the location of the STP root is predefined and hence can’t adjust to network designs that require a different location for this function. While L2G-STP can be used as a separate feature, the same functionality can be configured with a common STP root priority on the Edge-Device and the use of STP Root Guard.

In order to maintain the flexibility of overlay network deployments with VXLAN but have the ability to detect and protect against potential loops, Cisco provides an innovation: VXLAN EVPN Southbound Loop Detection and Mitigation.

Southbound Loop Detection and Mitigation


Let’s look at a VXLAN network in a spine/leaf topology to define “southbound looping”. The leaf is acting as Network Virtualization Edge-Device that is hosting the VXLAN Tunnel Endpoint (VTEP) function. In this topology, the VXLAN network represents the “northbound” portion of the network. The network from the leaf or Edge-Device to the “south” is most commonly the Ethernet network. As loops are potentially formed in this “southbound” network, the goal is to detect and mitigate loops that are introduced by the “southbound” network.

Cisco Prep, Cisco Tutorial and Material, Cisco Learning, Cisco Networks, Cisco Guides

North and south network topology.

Operations, Administration, and Maintenance (OAM) provides a framework for Connectivity Fault Management (CFM) defined in IEEE 802.1ag. Within this protocol framework and specifications, a continuous check message traverses intermediate bridges. This is a key criteria for enabling uninterrupted transfer of signaling across north-south borders. Based on well-defined triggers that span from initial port up to duplicate MAC detection (RFC7432 Section 15.1), check message probes are sent in a focused manner to detect if and where loops exist.

Loop detection is provided exclusively by the Edge-Devices that form the “northbound” VXLAN and bridge to the “southbound” Ethernet network. If the probe is not returned to the sending Edge-Devices, then no southbound Loop exists. If a southbound probe is returned, the existence of a loop is validated. As Edge-Devices become aware of a detected loop, notifications are shared with network operators and mitigation actions initiated.  

Cisco Prep, Cisco Tutorial and Material, Cisco Learning, Cisco Networks, Cisco Guides

A probe uncovers a loop in a southbound Ethernet network.

Loop Mitigation and Recovery


As part of the mitigation, the “southbound” Ethernet interfaces that participate in a loop are identified. As loops can exist in some VLANs but not in others, the granularity of control on a Port, VLAN basis is significant. In the action of mitigation, only the specific offending combination of VLAN and port is suppressed to break the detected loop and stop traffic radiation without disrupting other traffic on the port. Breaking the loop updates the topology which can affect the accuracy of the MAC address table. Therefore, a MAC-flush is initiated in the VLAN with the detected loop to enable proper re-learning and forwarding subsequent to the loop mitigation.

Once a loop has been mitigated, it can be difficult to know if the recovery—the unsuspending of a Port,VLAN combination—will reintroduce the loop. In order to prevent a false-recovery and loop reintroduction, a probe is sent prior to initiating the recovery while the Port,VLAN combination stays suspended (doesn’t forward traffic). If the probe still reports an indication of an existing southbound loop, the recovery process is stopped and the Port,VLAN stays suspended. After a given interval, loop detection is reinitiated. The recovery process continues until no loop is detected. Appropriate configuration, notification, and override commands are available to the Network Operator.

VXLAN EVPN with Built-In Southbound Loop Detection and Mitigation


Cisco NX-OS 9.3(5) provides native southbound loop detection and mitigation for VXLAN EVPN fabrics. The functionality extends the loop-free behavior of VXLAN EVPN’s Network Virtualization Overlay with existing Ethernet networks. While there are many use-cases that require loop detection and mitigation in a single fabric, the same functionality is available for VXLAN EVPN Multi-Site deployments. For these Multi-Site deployments, loop detection and mitigation supports the detection of backdoor links, the most prevalent cause of multi-site outages during extension or migrations.  

While many loop protection solutions support detecting the existence of loops in the overall topology and shutting down the offending ports, VXLAN EVPN Loop Detection and Mitigation defines the topology at the “VLAN-level”. Similar to Per-VLAN Spanning Tree variations (PVST+ and PVRST/802.1w) the functionality of VXLAN EVPN Loop Detection and Mitigation acts with comparable granularity. Differing from Spanning Tree, no pro-active calculation of a forwarding tree is built, but precautions are made to avoid the existence of loops and introducing them into the Overlay. VXLAN EVPN southbound loop detection and mitigation aims to ensure network uptime and avoid unnecessary risks due to loop creation, whether it is within a single fabric or across multiple fabrics with VXLAN EVPN Multi-Site.

Cisco Prep, Cisco Tutorial and Material, Cisco Learning, Cisco Networks, Cisco Guides

Looping can be accidentally introduced into multi-site fabrics through backdoor links.

Innovative Solutions for Increasing Data Center Resiliency


Increasing the stability of data center fabrics is key to supporting business resiliency — whether for a single on-premise brownfield fabric or when adding new multi-site greenfield fabrics. In order to optimize application performance and network stability, modern networks need to build upon a consistent, up-to-date platform instead of relying on a patchwork of technologies that can cause more conflicts than resolutions.

Even though modern VXLAN EVPN overlays prevent most looping scenarios natively, combining them with older network topologies can still introduce the risk of corrosive loops. Even carefully designed multi-site VXLAN EVPN data center fabrics can still accidentally create backdoor links, leading to looping-related performance issues. Cisco Nexus 9000 Series based NX-OS VXLAN implementation addresses the most prevalent loop scenarios within and among multi-site data centers to build and maintain a stable and resilient network architecture for your organization.

Wednesday 23 September 2020

Why SOAR Is the Future of Your IT Security

Cisco Prep, Cisco Tutorial and Material, Cisco Learning, Cisco Guides

The threat landscape evolves constantly, with new and increasingly sophisticated cyberattacks launching with growing frequency across network, cloud, and software-as-a-service environments.

As threats continue to stack up against organizations, IT teams face the challenge of managing heterogeneous end-user device environments composed of various network-connected devices, operating systems, and applications. They must ensure that consistent, organizationally-sanctioned controls are applied across these environments.

While this is achievable with the right security expertise, there is also a global cybersecurity skills shortage. In fact, 3.5 million cybersecurity positions are expected to remain unfulfilled by 2021.

These challenges are not insurmountable. They can be conquered with the security operations and incident response approach called SOAR.

What is SOAR?

SOAR refers to a solution stack of compatible software that allows organizations to orchestrate and automate different parts of security management and operations to improve the accuracy, consistency, and efficiency of security processes and workflows with automated responses to threats.

How does SOAR work?

Security orchestration

The first component of SOAR, security orchestration, involves leveraging the different, compatible products for use within a solution stack to orchestrate the management and operations activities through standardized workflows. These security solutions automatically aggregate data from multiple sources, add context to that data to identify potential weaknesses, and use risk modeling scenarios to enable automated threat detection.  Recognizing this, more and more organizations are prioritizing the need for effective integration between security technologies to enable rapid threat detection and response.

Security automation

The second component is security automation, which involves automating many of the repetitive actions involved in the threat detection process.

Traditionally, security analysts within an organization would handle threat alerts manually, usually multi-tasking to size up alerts from numerous point solutions. This increases the likelihood of human error, inconsistent threat response, and high severity threats being overlooked.

SOAR, on the other hand, automates gathering enrichment and intelligence data on an event, can perform common investigative steps on behalf of the analyst to help triage events, and consistently delivers on the orchestration and response of the incident response lifecycle.

Security response

The third component, security response, involves triage, containment, and eradication of threats.

Response methods depend on the type and scope of the threat. Some threat responses can be automated for faster results, such as quarantining files, blocking file hashes across the organization, isolating a host or disabling access to compromised accounts.

However, sophisticated cyber-attacks require sophisticated responses. This is where security playbooks come in.

With Cisco Managed Detection and Response (MDR), automation is supported by defined investigation and response playbooks, containing overviews of known threat scenarios and best practices for responding to different types of threats. The role of automation is to rapidly execute these playbooks.

Cisco Prep, Cisco Tutorial and Material, Cisco Learning, Cisco Guides

What does a threat detection and response process look like with SOAR?

Let’s start with an example based on AMP for Endpoints identifying a file as potentially malicious. SOAR would be able to begin the investigation process, start answering questions, and performing tasks automatically such as:

◉ Was the file quarantined?
◉ Was the file executed?
◉ Where else has this file been seen in the network?
◉ Detonate the file in a Cisco Threat Grid sandboxing environment
◉ Investigate using available context related to connection, file, and source at relevant technologies, such as Umbrella and Stealthwatch Cloud
◉ Retrieve any available threat intelligence information on the file and check for occurrences of known indicators of compromise (IOCs)
◉ Collect identification information on the host and username

The answers to these questions provide contextual information to the investigator to aid in determining the legitimacy, impact, urgency, and scope of the incident. This information in turn determines appropriate response actions, which may include:

◉ Quarantining the host on the network
◉ Blocking the file hash across the network
◉ Blocking IOCs
◉ Scanning and cleaning any devices with occurrences of IOCs

Betting on SOAR

The cybersecurity skills shortage, tight IT budgets, the dynamic nature of the threat landscape, and the need to optimize security operations make SOAR a compelling proposition.

With Cisco MDR, security alerts, correlation, and enrichment are automated; blocked items are propagated for instant containment; and indicators of compromise are reported near-instantly for blocking, hunting, and follow-up.

The result is streamlined security operations and a stronger security posture without breaking the IT budget or having to recruit a team of security analysts.

Tuesday 22 September 2020

Threat Landscape Trends: Endpoint Security, Part 1

Part 1: Critical severity threats and MITRE ATT&CK tactics

In the ongoing battle to defend your organization, deciding where to dedicate resources is vital. To do so efficiently, you need to have a solid understanding of your local network topology, cloud implementations, software and hardware assets, and the security policies in place. On top of that, you need to have an understanding of what’s traveling through and residing in your environment, and how to respond when something is found that shouldn’t be there.

This is why threat intelligence is so vital. Not only can threat intelligence help to defend what you have, it can tell you where you’re potentially vulnerable, as well as where you’ve been attacked in the past. It can ultimately help inform where to dedicate your security resources.

What threat intelligence can’t tell you is exactly where you’ll be attacked next. The fact is that  there’s no perfect way to predict an attacker’s next move. The closest you can come is knowing what’s happening out in the larger threat landscape—how attackers are targeting organizations across the board. From there it’s possible to make those critical, informed decisions based on the data at hand.

This is the purpose of this new blog series, Threat Landscape Trends. In it, we’ll be taking a look at activity in the threat landscape and sharing the latest trends we see. By doing so, we hope to shed light on areas where you can quickly have an impact defending your assets, especially if dealing with limited security resources.

To do this, we’ll dive into various Cisco Security technologies that monitor, alert, and block suspected malicious activity. Each release will focus on a different product, given the unique view of activity each can provide, informing you on different aspects of the threat landscape.

Beginning at the endpoint

To kick off the series, we’ll begin with Cisco’s Endpoint Security solution. Over the course of two blog posts we’ll examine what sort of activity we’ve seen on the endpoint in the first half of 2020. In the first, we’ll look at critical severity threats and the MITRE ATT&CK framework. In part two, to be published in the coming weeks, we’ll dive deeper into the data, providing more technical detail on threat types and the tools used by attackers.

To protect an endpoint, Cisco’s Endpoint Security solution leverages a protection lattice comprised of several technologies that work together. We’ll drill down into telemetry from one of these technologies here: the Cloud Indication of Compromise (IoC) feature, which can detect suspicious behaviors observed on endpoints and look for patterns related to malicious activity.

In terms of methodology for the analysis that follows, the data is similar to alerts you would see within the dashboard of Cisco’s Endpoint Security solution, only aggregated across organizations to get the percentage of organizations that have encountered particular IoCs as a baseline. The data set covers the first half of 2020, from January 1st through June 30th. We’ll cover this in more detail in the Methodology section at the end of this post, but for now, let’s dive into the data.

Threat severity

When using Cisco’s Endpoint Security solution, one of the first things you’ll notice in the dashboards is that alerts are sorted into four threat severity categories: low, medium, high, and critical. Here is a breakdown of these severity categories in terms of the frequency that organizations encountered IoC alerts:

Cisco Tutorial and Materials, Cisco Learning, Cisco Guides, Cisco Exam Prep, Cisco Prep

Percentage of low, medium, high, and critical severity IoCs

As you might expect, the vast majority of alerts fall into the low and medium categories. There’s a wide variety of IoCs within these severities. How serious a threat the activity leading to these alerts pose depends on a number of factors, which we’ll look at more broadly in part two of this blog series.

For now, let’s start with the most serious IoCs that Cisco’s Endpoint Security solution will alert on: the critical severity IoCs. While these make up a small portion of the overall IoC alerts, they’re arguably the most destructive, requiring immediate attention if seen.

Cisco Tutorial and Materials, Cisco Learning, Cisco Guides, Cisco Exam Prep, Cisco Prep

Critical severity IoCs

Sorting the critical IoCs into similar groups, the most common threat category seen was fileless malware. These IoCs indicate the presence of fileless threats—malicious code that runs in memory after initial infection, rather than through files stored on the hard drive. Here, Cisco’s Endpoint Security solution detects activity such as suspicious process injections and registry activity. Some threats often seen here include Kovter, Poweliks, Divergent, and LemonDuck.

Coming in second are dual-use tools leveraged for both exploitation and post-exploitation tasks. PowerShell Empire, CobaltStrike, Powersploit, and Metasploit are four such tools currently seen here. While these tools can very well be used for non-malicious activity, such as penetration testing, bad actors frequently utilize them. If you receive such an alert, and do not have any such active cybersecurity exercises in play, an immediate investigation is in order.

The third–most frequently seen IoC group is another category of dual-used tools. Credential dumping is the process used by malicious actors to scrape login credentials from a compromised computer. The most commonly seen of these tools in the first half of 2020 is Mimikatz, which Cisco’s Endpoint Security solution caught dumping credentials from memory.

All told, these first three categories comprise 75 percent of the critical severity IoCs seen. The remaining 25 percent contains a mix of behaviors known to be carried out by well-known threat types:
  • Ransomware threats like Ryuk, Maze, BitPaymer, and others
  • Worms such as Ramnit and Qakbot
  • Remote access trojans like Corebot and Glupteba
  • Banking trojans like Cridex, Dyre, Astaroth, and Azorult
  • …and finally, a mix of downloaders, wipers, and rootkits

MITRE ATT&CK tactics


Another way to look at the IoC data is by using the tactic categories laid out in the MITRE ATT&CK framework. Within Cisco’s Endpoint Security solution, each IoC includes information about the MITRE ATT&CK tactics employed. These tactics can provide context on the objectives of different parts of an attack, such as moving laterally through a network or exfiltrating confidential information.

Multiple tactics can also apply to a single IoC. For example, an IoC that covers a dual-use tool such as PowerShell Empire covers three tactics:
  • Defense Evasion: It can hide its activities from being detected.
  • Execution: It can run further modules to carry out malicious tasks.
  • Credential Access: It can load modules that steal credentials.
With this overlap in mind, let’s look at each tactic as a percentage of all IoCs seen:

Cisco Tutorial and Materials, Cisco Learning, Cisco Guides, Cisco Exam Prep, Cisco Prep

IoCs grouped by MITRE ATT&CK tactics

By far the most common tactic, Defensive Evasion appears in 57 percent of IoC alerts seen. This isn’t surprising, as actively attempting to avoid detection is a key component of most modern attacks.

Execution also appears frequently, at 41 percent, as bad actors often launch further malicious code during multi-stage attacks. For example, an attacker that has established persistence using a dual-use tool may follow up by downloading and executing a credential dumping tool or ransomware on the compromised computer.

Two tactics commonly used to gain a foothold, Initial Access and Persistence, come in third and fourth, showing up 11 and 12 percent of the time, respectively. Communication through Command and Control rounds out the top 5 tactics, appearing in 10 percent of the IoCs seen.

Critical tactics

While this paints an interesting picture of the threat landscape, things become even more interesting when combining MITRE ATT&CK tactics with IoCs of a critical severity.

Cisco Tutorial and Materials, Cisco Learning, Cisco Guides, Cisco Exam Prep, Cisco Prep

Critical severity IoCs grouped by MITRE ATT&CK tactics

For starters, two of the tactics were not seen in the critical severity IoCs at all, and two more registered less than one percent. This effectively removes a third of the tactics from focus.

What’s also interesting is how the frequency has been shuffled around. The top three remains the same, but Execution is more common amongst critical severity IoCs than Defense Evasion. Other significant moves when filtering by critical severity include:

  • Persistence appears in 38 percent of critical IoCs, as opposed to 12 percent of IoCs overall.
  • Lateral Movement jumps from 4 percent of IoCs seen to 22 percent.
  • Credential Access moves up three spots, increasing from 4 percent to 21 percent.
  • The Impact and Collections tactics both see modest increases.
  • Privilege Escalation plummets from 8 percent to 0.3 percent.
  • Initial Access drops off the list entirely, previously appearing fourth.

Defending against the critical


This wraps up our high-level rundown of the IoC data. So armed with this information about the common threat categories and tactics, what can you do to defend your endpoints? Here are a few suggestions about things to look at:

Limit execution of unknown files

If malicious files can’t be executed, they can’t carry out malicious activity. Use group policies and/or “allow lists” for applications that are permitted to run on endpoints in your environment. That’s not to say that every control available should be leveraged in order to completely lock an endpoint down—limiting end-user permissions too severely can create entirely different usability problems.

If your organization utilizes dual-use tools for activities like remote management, do severely limit the number of accounts that are permitted to run the tools, only granting temporary access when the tools are needed.

Monitor processes and the registry

Registry modification and process injection are two primary techniques used by fileless malware to hide its activity. Monitoring the registry for unusual changes and looking for strange process injection attempts will go a long way towards preventing such threats from gaining a foothold.

Monitor connections between endpoints

Keep an eye on the connections between different endpoints, as well as connections to servers within the environment. Investigate if two machines are connecting that shouldn’t, or an endpoint is talking to a server in a way that it doesn’t normally. This could be a sign that bad actors are attempting to move laterally across a network.

Monday 21 September 2020

How to Prepare for Cisco CCNP Enterprise 350-401 Certification?


Cisco ENCOR Exam Description:

This exam tests a candidate's knowledge of implementing core enterprise network technologies including dual stack (IPv4 and IPv6) architecture, virtualization, infrastructure, network assurance, security and automation. The course, Implementing Cisco Enterprise Network Core Technologies, helps candidates to prepare for this exam.

Cisco 350-401 Exam Overview:


Related Article: