Sunday, 17 April 2022

New Software Architecture Enables Session-Aware Networking to Massively Scale Authentication and Access Policy Control

As enterprise networks become more complex, the demands and challenges to secure them are increasing. Increased mobility, wireless networks, and Bring Your Own Device (BYOD) initiatives have broadened the attack surface. Access security must be capable of scaling to accommodate the increased access demands of myriad devices.

Session Aware Networking (SANet) is a framework and set of features that provide authentication, access control, and user specific policies. The SANet re-architecture has evolved from being a single core Cisco IOS XE application to a horizontally scalable application adapting to Cisco’s database-centric programming model. The device state is now maintained in the database along with making use of the multicore capabilities of device platforms.

The decoupling of SANet features from the IOS XE daemon allows for much greater authentication scalability and flexibility in addressing various business requirements.

Scaling Access Security

SANet is the session management software on IOS XE-based devices and plays a vital role in Identity Based Networking Services (IBNS). Enterprise wired and wireless networking products that run IOS XE use SANet to handle session management (Figure 1). Having the same control plane software for session management across all Cisco enterprise product families that run IOS XE enables two things:

◉ Higher feature velocity and availability across all the products

◉ A uniform control plane across all Cisco products that enables the deployment of security policies at multiple locations in the network with ease

Figure 1. SANet Architecture and Features

Following the principles of the IOS XE database-centric programming model and horizontally scalable architecture, SANet was designed to address the expanding scalability requirements of wired and wireless networks. For example, wireless LAN controllers may have higher scalability requirements compared to fixed-port switches. It offers a more consistent way to configure features across technologies, easy deployment, and customization of features. Having a single solution to address these diverse requirements simplifies through standardization.

The database-centric programming model, along with the IOS XE infrastructure, provides access to other features like compiler-integrated patching, integrated telemetry, and unified software tracing, to name a few. It also benefits from any future enhancements to the complete IOS XE stack, like process restart-ability, multi-tenancy, etcetera.

Multiple Authentication Methods and Comprehensive Policy Control


SANet provides an extensive list of authentication mechanisms and a robust policy framework that can apply policies defined locally or on an external server. Session insights or attributes are sent during authentication or accounting to a configured external server, like Cisco Identity Services Engine (ISE) or third-party servers, to make network policies flexible, consistent across the network, and easy to manage.

Authentication methods available with SANet include 802.1X, Web Authentication, and MAC Authentication Bypass (MAB). It is possible to use a combination of these methods to address various business requirements. For example, MAB followed by Web-based authentication may be used for various solutions that demand diverse types and combinations of session policies. Security policies like Access Control List (ACL) applied initially to a user session can change as an increased number of user identity details are learned. Or a policy may be applied to a guest user to limit the time that the user is allowed to be connected to the network.

Source: cisco.com

Saturday, 16 April 2022

Intersight Workload Optimizer: How to Tame the Public Cloud

In this installment, we’re going to focus on public cloud optimization, which differs slightly from its on-premises counterpart. In an on-premises data center, infrastructure is generally finite in scale and fixed in cost. By the time a new physical server hits the floor, the capital has been spent and has taken a hit on your business’s bottom line. In this context, on-premises optimization means maximizing utilization of the sunk cost of capital infrastructure (while still assuring performance of the workload, of course).

In the public cloud, however, infrastructure is effectively infinite. Resources are generally far more elastic and often paid for out of an operating expenditure budget rather than a capital budget. In this case, cloud optimization means minimizing cloud spend, and the burden of maximizing hardware utilization falls to the cloud provider. Minimizing cloud spend proves to be a daunting exercise for cloud administrators given the public cloud’s vast array of instance sizes and types (over 400 in Amazon Web Services alone, as shown in Figure 1: Amazon Web Services instance types, all with slightly different resource profiles and costs, and with new options and pricing changing almost daily. At scale, selecting the ideal instance type, size, term, etc. for every workload at every moment in order to assure performance and minimize spend is arguably an impossible task for a human, but is an ideal use case for the IWO decision engine.

Cisco Exam Prep, Cisco Certification, Cisco Learning, Cisco Career, Cisco Skills, Cisco Jobs, Cisco
Figure 1: Amazon Web Services instance types

Taking action in the public cloud


So let’s take a look at the types of real-time actions IWO offers for public cloud optimization. In Figure 2, starting on the Cloud tab of the main Supply Chain screen, we see a number of widgets on the right with actionable information – Pending Actions, Top Accounts, Necessary Investments, Potential Savings, etc.

Cisco Exam Prep, Cisco Certification, Cisco Learning, Cisco Career, Cisco Skills, Cisco Jobs, Cisco
Figure 2: Supply Chain view of the Public Cloud and Pending Actions widget

Clicking on “Show All” in the Top Accounts widget, we see a list of all our public cloud accounts and subscriptions in a hierarchical table, as shown in Figure 3.

Cisco Exam Prep, Cisco Certification, Cisco Learning, Cisco Career, Cisco Skills, Cisco Jobs, Cisco
Figure 3: Public cloud account details table

Clicking on one of the green action buttons on the right, we see the current pending actions for a specific account, as shown in Figure 4.  There we see a number of storage volume actions highlighted, some relating to performance needs, others to recoup savings due to over-provisioning (i.e. you can move to a cheaper tier of storage and still assure performance).

Cisco Exam Prep, Cisco Certification, Cisco Learning, Cisco Career, Cisco Skills, Cisco Jobs, Cisco
Figure 4: Action Center table with details on specific pending storage actions for a given account

In this specific example, a keen-eyed reader might notice something curious about the two performance actions at the top of the list: even though the actions are being taken to provide more IOPS (moving from 160 to 3000 IOPS) to assure performance, the cost impact is actually lower.  That’s right – these actions are providing more performance for less cost! While maybe not entirely common, this example shows just how quirky the plethora of options are in the public cloud, and how difficult it can be for humans to avoid leaving money on the table. (This example is also non-disruptive and reversible, as noted in the table, with the ability to execute immediately with the click of a button.  (What’s not to like?)

Clicking on the Scale Virtual Machines tab in the Action Center list, we see the current pending actions to rightsize our VMs, as shown in Figure 5.

Cisco Exam Prep, Cisco Certification, Cisco Learning, Cisco Career, Cisco Skills, Cisco Jobs, Cisco
Figure 5: Action Center table with details on specific pending VM actions for a given account

Clicking on the details button in the first row takes us to the Action Details window providing us clear data behind the decision, as well as the expected outcome of the action from both a performance and a cost perspective, as shown in Figure 6. We can also conveniently run the action with a single button click, right from the dashboard interface.

Cisco Exam Prep, Cisco Certification, Cisco Learning, Cisco Career, Cisco Skills, Cisco Jobs, Cisco
Figure 6: Action Details for a specific VM scaling action

This detailed information is available for every action IWO recommends, across all workloads in all cloud accounts. Choosing the right action, with even just a handful of workloads, is difficult for a human. Getting it right across many tens, hundreds, or thousands of workloads spread across multiple accounts in multiple clouds in real time is a problem that IWO is uniquely positioned to solve.

Reserved instances: rent or lease?


To further complicate matters for a cloud administrator, you have the option of consuming instances in an on-demand fashion — i.e., pay as you use — or via Reserved Instances (RIs) which you pay for in advance for a fixed term (usually a year or more). RIs can be incredibly attractive as they are typically heavily discounted compared to their on-demand counterparts, but they are not without their pitfalls.

The fundamental challenge of consuming RIs is that you will pay for the RI whether you use it or not. In this respect, RIs become more like the sunk cost of a physical server on-premises than the intermittent cost of an on-demand cloud instance. One can think of on-demand instances as being well-suited for temporary or highly variable workloads, analogous to a car-less city dweller renting a car: usually cost-effective for an occasional weekend trip, but cost-prohibitive for long-term use. RIs are akin to leasing a car: often the right economic choice for longer-term, more predictable usage patterns (say, commuting an hour to work each day).

When faced with a myriad of instance options and terms, you are generally forced down one of two paths: 1) only purchase RIs for workloads that are deemed static and consume on-demand instances for everything else (hoping, of course, that static workloads really do remain that way); or 2) pick a handful of RI instance types — e.g., small, medium, and large — and shoehorn all workloads, static or variable, into the closest fit. Both methods leave a lot to be desired.

In the first case, it’s not at all uncommon for static workloads to have their demand change over time as app use grows or new functionality comes online. In these cases, the workload will need to be relocated to a new instance type, and the administrator will have an empty hole to fill in the form of the old, already paid-for RI (see examples in Figure 7).

Cisco Exam Prep, Cisco Certification, Cisco Learning, Cisco Career, Cisco Skills, Cisco Jobs, Cisco
Figure 7: Changes in workload demand can trigger numerous cascading decisions for RI consumption

What should be done with that hole? What’s the best workload to move into it? And if that workload is coming from its own RI, the problem simply cascades downstream. The unpredictability of such headaches often negates the potential cost savings of RIs.

In the second scenario, limiting the RI choices almost by definition means mismatching workloads to instance types, negatively affecting either workload performance or cost savings, or both. In either case, human beings, even with complicated spreadsheets and scripts, will invariably get the answer wrong because the scale of the problem is too large and everything keeps changing, all the time, so the analysis done last week is likely to be invalid this week.

Thankfully, IWO was developed to understand both on-demand instances and RIs in detail through native API target integrations with popular public cloud providers like AWS and Azure. IWO capabilities are constantly receiving real-time data on consumption, pricing, and instance options directly from the cloud providers, and combining such data with the knowledge of applicable customer-specific pricing and enterprise agreements to determine the best actions available at any given point in time.

Cisco Exam Prep, Cisco Certification, Cisco Learning, Cisco Career, Cisco Skills, Cisco Jobs, Cisco
Figure 8: Detailed inventory information and purchase actions for RIs

Not only does IWO technology understand current and historical workload requirements and an organization’s current RI inventory (see above), but it also has the capability to intelligently recommend the optimal consumption of existing RI inventory and additional RI purchases to minimize future spending. In Figure 9, we have a Pending Action to buy 13 RIs which would take the RI coverage up to the horizontal black line in the chart.  Most of the area under the blue and turquoise curves, representing the workload resource requirements, would be covered by RIs – everything below the black line.  The peaks above the black line would be covered by on-demand purchases. While you could purchase enough RIs to cover all the area under the curve, this is not the most cost-effective option to meet workload demand.

Cisco Exam Prep, Cisco Certification, Cisco Learning, Cisco Career, Cisco Skills, Cisco Jobs, Cisco
Figure 9: Details supporting a specific RI purchase action

Continuing with our car analogy, in addition to knowing whether it’s better to rent or lease a car in any given circumstance, IWO can even suggest a car lease (RI purchase) that can be used as a vehicle for ride-sharing. IWO can fluidly move on-demand workloads in and out of a given RI to achieve the lowest possible cost while still assuring performance.

In short, IWO has the ability to understand the optimal combination of RI purchases and on-demand spending across your entire public cloud estate, in real-time.

Cloud Migration Planning


Finally, because IWO uses the same underlying decision engine for both the on-premises and public cloud environments, it can bridge the gap between them. The process of migrating VM workloads from on-prem to the public cloud can be simulated in IWO’s planning module and will allow the selection of specific VMs or VM groups to generate the optimal purchase actions required to run them, as shown in Figure 10.

Cisco Exam Prep, Cisco Certification, Cisco Learning, Cisco Career, Cisco Skills, Cisco Jobs, Cisco
Figure 10: On-prem to public cloud workload migration planning results

These plan results offer two options: Lift & Shift and Optimized, depicted in the blue and green columns, respectively. Lift & Shift shows the recommended instances to buy, and their costs, assuming no changes to the size of the existing VMs. Optimized allows for VM right-sizing in the process of moving to the cloud, which often results in a lower overall cost if current VMs are oversized relative to their workload needs. Software licensing (e.g., bring- your-own vs. buy from the cloud) and RI profile customizations are also available to further fine-tune the plan results.

Have your cake and eat it too


IWO has the unique ability to apply the same market abstraction and analysis to both on-premises and public cloud workloads, in real-time, enabling it to add value far beyond any cloud-specific or hypervisor-specific, point-in-time tools that may be available. Besides being multi-vendor, multi-cloud, and real-time by design, IWO does not force you to choose between performance assurance and cost/resource optimization.

Source: cisco.com

Thursday, 14 April 2022

Expanding workloads for UCS X-Series with UCS X-Fabric Technology

First, we increased the amount of storage – going from two drives on a B-Series to six drives on a X-Series. Now we are adding GPUs to X-Series that were previously only available in rack servers.

How does it work?

The VIC (Virtual Interface Card) on the server node connects to the UCS X9416 X-Fabric Module. The X-Fabric Module connects to the UCS X440p PCIe node with the GPUs. This elegant, easily upgradable, cable-free solution is only possible on a mid-plane, free chassis design like UCS X-Series.

Cisco UCS X9416 X-Fabric Module

The first UCS X-Fabric Technology module is PCIe Gen 4 expansion for the UCS X210c M6 Compute Node. The two X9416 X-Fabric Modules expand the PCIe bus from the server to the UCS X440p PCIe Node. No cables, no fuss, no muss.

Cisco Exam Prep, Cisco Learning, Cisco Career, Cisco Skills, Cisco Jobs, Cisco

Cisco UCS X440p PCIe Node


More and more applications can benefit from accelerators like GPUs. Ranging from AI/ML to the stalwart VDI, adding one or more GPUs to a server can greatly improve the user experience and application performance.

The Cisco UCS X440p PCIe Node allows you to add up to four GPUs to a Cisco UCS X210c Compute Node in conjunction with the UCS X9416 X-Fabric Module.

Cisco Exam Prep, Cisco Learning, Cisco Career, Cisco Skills, Cisco Jobs, Cisco

Different workloads require different types of GPUs. Cisco initially supports:

◉ Up to two Nvidia A100 Tensor Core GPUs
◉ Up to two Nvidia A16 GPUs
◉ Up to two Nvidia A40 GPUs
◉ Up to four Nvidia T4 Tensor Core GPUs

The modular design of UCS X-Series allows you to decouple the CPU / GPU refresh cycles. GPU suppliers like Nvidia, AMD, and Intel release their products at a different cadence than CPUs. If your application performance is sensitive to GPU performance, being able to simply slide in an UCS X440p PCIe node with the latest GPUs allows you extend the investment of all the other solution components (chassis, servers, IFMs, & PSUs) providing a better overall TCO.

Cisco Intersight


Management has always been a UCS superpower. Being able to manage every component for X-Series in a single app is paramount. From inventory to firmware updates, you manage all the UCS X-Fabric Technology components with the same process you manage the server and Intelligent Fabric Modules.

Run any app


Your requirements for modern, accelerated workloads shouldn’t dictate your server form factor. They should seamlessly integrate into a system that also runs all your traditional applications from core infrastructure, to database, to VSI. A single system, managed from the cloud, spanning all your workloads not only simplifies your environment, but will allow you to focus on business needs, not figuring out what unique hardware is needed for a specific application.

Source: cisco.com

Wednesday, 13 April 2022

Hybrid cloud networks are calling – Cisco Nexus Dashboard has the answer

The scale and complexity of modern enterprise infrastructure environments are exploding as workloads become pervasive and infrastructure becomes more hybrid and distributed across data centers, edge, and public cloud resources. Recent events have put the importance of successful network operations at the forefront. Through a barrage of obstacles, today’s resilient and successful organizations are modernizing and automating their network operations to stay ahead of the curve. Their destination? Hybrid Cloud — and by a vast majority.

According to IDC research, 55% of organizations currently have hybrid cloud and use it as a framework to deploy applications where scale-out architecture and high availability networks are needed. Another 29% reported not having a hybrid cloud, but they planned to create one within a year. And by 2025, 70% of organizations will modernize their applications based on drivers like data security, organizational flexibility and agility, and productivity gains versus drivers like IT cost savings.

To solve hybrid cloud application complexity, your IT needs to focus on automating application infrastructure management. This includes the many personas involved in configuration, provisioning, lifecycle operations, and orchestration of cloud and hybrid datacenter environments. But scaling applications into hybrid cloud also increases the cost of managing thousands of distributed devices, containers, and network services. These elements require more knowledge and more time to troubleshoot the many interconnected parts using multiple purpose-built solution tools and methods.

To be successful, it’s critical to have the visibility and insights into your network wherever your data is created with consistent network and policy orchestration across multiple data centers —whether on-premises, in the cloud, or at the edge. At Cisco, we know that for any move to the hybrid cloud lifecycle journey, there is an uncompromising need for a centralized approach to manage network capabilities. Cisco’s Nexus Dashboard is our newest cloud networking platform innovation to help with this very problem. With its One View presentation of all your hybrid cloud network sites, your IT operators can use a single agile platform, to operate all their network infrastructure in a single place. How many split personas (NetOps, DevOps, SecOps, and CloudOps) do you have? Cisco Nexus Dashboard bridges all the tools needed by each persona with a flexible operational model for all use cases on a single platform.

Figure: Cisco Nexus Dashboard: Centralized hybrid cloud networking platform

We listened, here are the recent innovations that you asked for:

Bolster Cloud Neutral Support


Recent innovations include expansion to the hybrid cloud with added support for Google Cloud, simplifying network management across multiple public cloud sites. Nexus Dashboard is available in the AWS and Azure marketplaces and will also be featured in the Google Cloud marketplace.

Improved Intelligence and Site Management

Key new features support air gap environments, provide simplification of experience such as reduced app downtime due resource challenges when upgrading an app or installing by determining whether you have enough resources etc. The dashboard will predetermine the resources needed for the app and environment to run smoothly.

With air gap support, customers that are not connected to Cisco Cloud can utilize insights advisory features to better identity risks to their infrastructure and get decrypted updates on PSIRTS, EOS/EOL, field notices etc. Syslog support as well as customization and personalization features are newly available with the interface.

Decreased Dependence on Physical Hardware

Additional scale improvements are being implemented with the virtual form factor of Nexus Dashboard, where additional physical hardware is not required to run the Nexus Dashboard in your environment. Please refer to the Nexus Dashboard datasheet for more details.

End to End Visibility


External devices such as firewalls and integrations such as vCenter – offer broader visibility, correlated telemetry and deeper insights beyond the core network. For end-to-end visibility, it is imperative to additionally understand where exactly the problem truly lies, which allows for quick remediation. L4-L7 and cross-domain integrations are a key strategy to gain comprehensive visibility. With the new vCenter integration, the Insights function can incorporate virtualized workload data such as hypervisor name, VM name, VM health into telemetry. This will enable visibility across silos, into the virtualized environments and enable faster MTTR for customers by correlating network events and application issues.


The Insights function is also able to enable optimal network and application performance and ensure continuous availability with recent AppDynamics SaaS support.

Considering recent news of companies that have implemented changes resulting in outages, we’d like to emphasize that pre-change validation with upgrade assist are key capabilities of the Insights function. It’s a critical capability that evaluates configuration changes before they are deployed to allow IT Ops to make changes with confidence. This removes unintended consequences that take down applications and/or the network. I encourage you to check out these Nexus Dashboard capabilities.

Which performance zone is right for you?


Customers have different reasons as to operate in the various performance zones (be it OKRs, metrics, speed of the business or foundational capabilities) in terms of the people, process, and technology alignment.


With the Nexus Dashboard, we are helping customers figure out where in the network infrastructure automation journey they are. Then help them in their journey to move their performance zone from reactive, to proactive and then to optimizing and the visionary self-healing, self-driving, and self-diagnostic networks.

Source: cisco.com

Tuesday, 12 April 2022

Announcing Risk-Based Endpoint Security with Cisco Secure Endpoint and Kenna Security

With a tidal wave of vulnerabilities out there and brand-new vulnerabilities coming out daily, security teams have a lot to handle. Addressing every single vulnerability is nearly impossible and prioritizing them is no easy task either since it’s difficult to effectively focus on the small number of vulnerabilities that matter most to your organization. Moreover, the shift to hybrid work makes it harder to assess and prioritize your vulnerabilities across your endpoints with traditional vulnerability scanners.

Kenna Security maps out the vulnerabilities in your environment and prioritizes the order in which you should address them based on a risk score. We’re excited to announce that after Cisco acquired Kenna Security last year, we have recently launched an integration between Kenna and Cisco Secure Endpoint to add valuable vulnerability context into the endpoint.

With this initial integration, Secure Endpoint customers can now perform risk-based endpoint security. It enables customers to prioritize endpoint protection and enhances threat investigation to accelerate incident response with three main use cases:

1. Scannerless vulnerability visibility: In a hybrid work environment, it’s increasingly difficult for traditional vulnerability scanners to account for all devices being used. Instead of relying on IP address scanning to identify vulnerabilities in an environment, you can now use the existing Secure Endpoint agent to get a complete picture of the vulnerabilities you need to triage.

2. Risk-based vulnerability context: During incident response, customers now have an additional data point in the form of a Kenna risk score. For example, if a compromised endpoint has a risk score of 95+, there is a high likelihood that the attack vector relates to a vulnerability that Kenna has identified. This can dramatically speed up incident response by helping the responder focus on the right data.

3.Accurate, actionable risk scores: Organizations often struggle to prioritize the right vulnerabilities since most risk scores such as Common Vulnerability Scoring System (CVSS) are static and lack important context. In contrast, the Kenna Risk Score is dynamic with rich context since it uses advanced data science techniques such as predictive modeling and machine learning to consider real-world threats. This enables you to understand the actual level of risk in your environment and allows you effectively prioritize and remediate the most important vulnerabilities first.

How does the Kenna integration work?

The Kenna integration brings Kenna Risk Scores directly into your Secure Endpoint console. As an example of this integration, the computer in the screenshot below (Figure 1) has been assigned a Kenna Risk Score of 100.

Cisco Secure Endpoint, Kenna Security, Cisco, Cisco Exam Prep, Cisco Leaning, Cisco Preparation, Cisco Materials
Figure 1: Kenna Risk Score in the Secure Endpoint console

Risk scores can be anywhere from 0 (lowest risk) to 100 (highest risk). The score is inferred based on the reported OS version, build, and revision update information, combined with threat intelligence on vulnerabilities from Kenna.

Clicking on the actual numeric score itself brings you to a page with a detailed listing of all vulnerabilities present on the endpoint (see Figure 2 below).

Cisco Secure Endpoint, Kenna Security, Cisco, Cisco Exam Prep, Cisco Leaning, Cisco Preparation, Cisco Materials
Figure 2: List of all vulnerabilities on an endpoint

Each vulnerability has a risk score, an identifier, and a description that includes icons with additional details based on vulnerability intelligence from Kenna:

Active Internet Breach: This vulnerability is being exploited across active breaches on the Internet
Easily Exploitable: This vulnerability is easy to exploit with proof-of-concept code being potentially available

Malware Exploitable: There is known malware exploiting this vulnerability


All of this information is extremely valuable context during an incident investigation. Exploiting vulnerabilities is one of the most common ways malicious actors carry out attacks, so by quickly understanding which vulnerabilities are present in the environment, incident responders have a much easier time honing in on how an attacker got into their organization.

Additionally, for vulnerabilities that currently have fixes available, clicking on the green “Fix Available” button on each vulnerability displays a box with links to the applicable patches, knowledge base articles, and other relevant information (see Figure 3 below). This gives analysts the information they need to efficiently act on an endpoint.

Cisco Secure Endpoint, Kenna Security, Cisco, Cisco Exam Prep, Cisco Leaning, Cisco Preparation, Cisco Materials
Figure 3: Recommended fixes for each vulnerability

Who can access the Kenna integration?


Vulnerability information and Risk Scores from Kenna Security are now available in the Cisco Secure Endpoint console for:

◉ Windows 10 computers running Secure Endpoint Windows Connector version 7.5.3 and newer
◉ Customers with a Secure Endpoint Advantage or Premier tier license, including Secure Endpoint Pro

Most vulnerabilities in our customer base occur on Windows 10 workstations, so we decided to release first with Windows 10 to deliver this integration faster. We plan on adding support for other Windows versions and operating systems such as Windows 11, Windows Server 2016, 2019, and 2022 in the near future.

We hope that you find this integration useful! This is the first of many steps that we are taking to incorporate vulnerability information from Kenna Security into Secure Endpoint, and we are excited to see what other use cases we can enable for our customers.

The Cisco Secure Choice Enterprise Agreement is a great way to adopt and experience the complete Secure Endpoint and Kenna technology stack.  It provides instant cost savings, the freedom to grow, and you only pay for what you need.

Source: cisco.com

Sunday, 10 April 2022

Supercharging indoor IoT management – Cisco DNA Spaces IoT Services Policy Engine

IoT Management at scale

Cisco DNA Spaces IoT Services provides tools to manage a myriad of IoT devices easily. However, the management of these IoT devices was still a manual operation. Each IoT device had to be individually onboarded and configured. If there was an error, it needed to be manually reconfigured. This becomes cumbersome as the number of managed devices increases. Furthermore, manual maintenance for a large number of managed devices is equally taxing. How do we know when a device is about to run out of battery? How to ensure that customer experience is not impacted if someone moves a beacon from one zone to another? How to roll out firmware upgrades without impacting operation? Even with IoT Management, these problems remained intractable at scale.

IoT Services Policy Engine

Enter Cisco DNA Spaces IoT Services Policy Engine. IoT Service policies are use-case-based and address unique problems that the scale and complexity of a large IoT deployment entail. Devices no longer need to be individually onboarded to deploy a use case. Customized policies can be created beforehand and associated with a class of devices at a specific location. Whenever a new device is turned on, it inherits the policy associated with that location and gets auto-configured. IoT Services even provides policy templates to support single-click use case deployment.

Groups

Policies are configured to act on device groups. Classes of devices can be logically organized into groups. Groups can be created manually or based on some logical criteria such as the beacon location, manufacturer, or the mac address prefixes. Let’s say a customer wants to enable Asset Tracking on all the beacons in a certain zone of a building. In that case, the customer first creates a dynamic group targeting the zone. Whenever DNA Spaces locates a beacon in that zone, it automatically assigns it to the group. Group assignment for a beacon gets propagated through firehose notifications as well.

Fig #1: Dynamic Grouping

Policies


Policies help in rolling out use cases across device groups. Each policy solves a specific customer use case and comes with a suggested policy template which helps in rolling out a policy across a group easily. Customers can thus deploy a policy once and then DNA Spaces IoT Services ensures that the use case is always enforced across all the targeted beacons. This completely eliminates the need for manual onboarding or maintaining IoT devices.

Fig #2: Policy Configuration

Once a policy is deployed, IoT Services also displays the number and list of devices on which the policy got applied.

Fig #3: Policy Device count

Alerts


When a policy is applied or it fails to get applied, an alert is generated. Alerts may be system alerts that can be viewed in the DNA Spaces dashboard or notification alerts like emails. Notification alerts are batched and delivered every 15 mins.

Fig #4: Policy Alert

Alerts are especially important for monitoring and security-based policies such as battery monitoring or beacon spoofing.

A New Era


Cisco DNA Spaces IoT Services Policy ushers in a new era of hands-free enterprise IoT Management. It brings together unmatched processing and machine intelligence to deliver a seamless management experience hitherto unseen in enterprise IoT. With new policies being added over time, it is destined to become a bedrock for IoT Management.

Source: cisco.com

Saturday, 9 April 2022

Addressing the noisy neighbor syndrome in modern SANs

The noisy neighbor syndrome on cloud computing infrastructures

The noisy neighbor syndrome (NNS) represents a problematic situation often found in multi-tenant infrastructures. IT professionals associate this figurative expression with cloud computing. It comes manifest when a co-tenant virtual machine monopolizes resources such as network bandwidth, disk I/O or CPU and memory. Ultimately, it will negatively affect performance of other VMs and applications. Without implementing proper safeguards, appropriate and predictable application performance is difficult to achieve, resulting into ensuing end user dissatisfaction.

The noisy neighbor syndrome originates from the sharing of common resources in some unfair way. In fact, in a world of finite resources, if someone takes more than licit, others will only get leftovers. To some extent, it is acceptable that some VMs utilize more resources than others. However, this should not come with a reduction in performance for the less pretentious VMs. This is arguably one of the main reasons for which many organizations prefer to avoid virtualizing their business-critical applications. This way they try to reduce the risk of exposing business critical systems to noisy neighbor conditions.

To tackle the noisy neighbor syndrome on hosts, different solutions have been considered. One possibility comes from reserving resources to applications. The downside is a reduction in the average infrastructure utilization. Moreover, it will increase cost and impose artificial limits to vertical scale of some workloads. Another possibility comes from rebalancing and optimizing workloads on hosts in a cluster. Tools exist to resize or reallocate VMs to hosts for better performance. All this happens at the expense of an additional level of complexity.

In other cases, greedy workloads might be best served on a bare metal server rather than virtualized. Using bare metal instead of virtualized applications can address the noisy neighbor challenge at the host level. This is because bare metal servers are single tenant, with dedicated CPU and RAM resources. However, the network and the centralized storage system remain shared resources and so multi-tenant. Infrastructure over-commitment due to greedy workloads remains a possibility and that would limit overall performance.

The noisy neighbor syndrome on storage area networks

Generalizing the concept, the noisy neighbor syndrome can also be associated with storage area networks (SANs). In this case, it is more typically described in terms of congestion. There are four well-categorized situations determining congestion at the network level. They are poor link quality, lost or insufficient buffer credits, slow drain devices and link overutilization.

The noisy neighbor syndrome does not manifest in the presence of poor link quality or lost and insufficient buffer credits, nor with slow drain devices. That’s because they are essentially underperforming links or devices. The noisy neighbor syndrome is instead primarily associated to link overutilization. At the same time, the noisy neighbor terminology would refer to a server, not a disk. That’s because communication, either reads or writes, originates from initiators, not targets.

Cisco Certification, Cisco Learning, Cisco Career, Cisco Skills, Cisco Learning, Cisco Tutorial and Materials

The SAN is a multi-tenant environment, hosting multiple applications and providing connectivity and data access to multiple servers. The noisy neighbor effect occurs when a rogue server or virtual machine uses a disproportionate quantity of the available network resources, such as bandwidth. This leaves insufficient resources for other end points on the same shared infrastructure, causing network performance issues.

The treatment for the noisy neighbor syndrome may happen at one or multiple levels, such as host, network, and storage level, depending on the specific circumstances. A common situational challenge presents when a backup application monopolizes bandwidth on ISLs for a long period of time. This may come to the performance detriment of other systems in the environment. In fact, other applications will be forced to reduce throughput or increase their wait time. This challenge is best solved at the network level. Another example is when a virtualized application is monopolizing the shared host connection. In this case, the solution might involve remediation at both the host and network level. Intuitively, this phenomenon becomes more pervasive as the number of hosts and applications increases in data center environments.

Strategies to solve the noisy neighbor syndrome


The solution to the noxious noisy neighbor syndrome is not found by statically assigning resources to all applications, in a democratic way. In fact, not all applications need the same quantity of resources or have the same priority. Dividing available resources in equal parts and assigning them to applications would not do justice to the heaviest and often mission critical ones. Also, the need for resources might change over time and be hard to predict with a level of accuracy.

The true solution for silencing noisy neighbors comes from ensuring any application in a shared infrastructure receives the necessary resources when needed. This is possible by designing and properly sizing the data center infrastructure. It should be able to sustain the aggregate load at any time and include ways to dynamically allocate resources based on needs. In other words, instead of provisioning your datacenter to average load, you should design to deal with the peak load or close to that.

At the storage network level, the best way to solve the noisy neighbor challenge is by doing a proper design and adding bandwidth, as well as frame buffers, to your SAN. At the same time, try making sure storage devices can handle input/output operations per second (IOPS) above and beyond the typical demand. Multiport all flash storage arrays can reach IOPS levels in the range of millions. Their adoption has virtually eliminated any storage I/O contention issues on the controllers and media, shifting the focus onto storage networks.

Overprovisioning of resources is an expensive strategy and not often a possibility. Some companies prefer to avoid this and postpone investments. They strive to find a balance between the cost of infrastructure and an acceptable level of performance. When shared resources are insufficient to satisfy all needs simultaneously, a possible line of defense comes from prioritization. This way, mission-critical applications will be served appropriately, while accepting that less important ones may get impacted.

Features like network and storage quality of service (QoS) can control IOPS and throughput for applications, limiting the noisy neighbor effect. By setting IOPS limits, port rate limits and network priority, we can control the quantity of resources each application receives. Therefore, no single server or application instance monopolizes resources and hinders the performance of others. The drawback of the QoS approach is the accretive administrative burden. It takes time to determine priority of individual applications and to configure the network and storage devices accordingly. This explains the low adoption of this methodology.

Another consideration is that traffic profile of applications changes over time. The fast detection and identification of SAN congestion might not be sufficient. The traditional methods for fixing SAN congestion are manual and unable to react quickly to changing traffic conditions. Ideally, always prefer a dynamic solution for adjusting the allocation of resources to applications.

Cisco MDS 9000 to the rescue


Cisco MDS 9000 Series of switches provides a set of nifty capabilities and high-fidelity metrics that can help address the noisy neighbor syndrome at the storage network layer. First and foremost, the availability of 64G FC technology coupled with a generous allocation of port buffers proves helpful in eliminating bandwidth bottlenecks, even on long distances. In addition, a proper design can alleviate network contention. This includes the use of a low oversubscription ratio and making sure ISL aggregate bandwidth matches or exceeds overall storage bandwidth.

Several monitoring options, including Cisco Port-Monitor (PMON) feature, can provide a policy-based configuration to detect, notify, and take automatic port-guard actions to prevent any form of congestion. Application prioritization can result from configuring QoS at the zone level. Port rate limits can impose an upper bound to voracious workloads. Automatic buffer credit recovery mechanisms, link diagnostic features and preventive link quality assessment using advanced Forward Error Correction techniques can help to address congestion from poor link quality or lost and insufficient buffer credits. The list of remedies includes Fabric Performance Impact Notification and Congestion Signals (FPIN), when host drivers and HBAs will support that standard-based feature. But there is more.

Cisco MDS Dynamic Ingress Rate Limiting (DIRL) software prevents congestion at the storage network level with an exclusive approach, based on an innovative buffer to buffer credit pacing mechanism. Not only does Cisco MDS DIRL software immediately detect situations of slow drain and overutilization in any network topology, but it also takes proper action to remediate. The goal is to reduce or eliminate the congestion by providing the end device the amount of data it can accept, not more. The result will be a dynamic allocation of bandwidth to all applications. This will eventually eliminate congestion from the SAN. What is exceedingly interesting about DIRL is its being network-centric and not requiring any compatibility with end hosts.

The diagram below shows a noisy neighbor host becoming active and monopolizing network resources, determining throughput degradation for two innocent hosts. Let’s now enable DIRL on the Cisco MDS switches. When repeating the same scenario, DIRL will prevent the same rogue host from monopolizing network resources and gradually adjust it to the performance level where innocent host will see no impact. With DIRL, the storage network will self-tune and reach a state where all the neighbors happily coexist.

Cisco Certification, Cisco Learning, Cisco Career, Cisco Skills, Cisco Learning, Cisco Tutorial and Materials

The trouble-free operation of the network can be verified by using the Nexus Dashboard Fabric Controller, the graphical management tool for Cisco SANs. Its slow drain analysis menu can report about situations of congestion at the port level and facilitate administrators with an easy to interpret color coding display. Similarly deep traffic visibility offered by SAN Insights feature can expose metrics at the FC flow level and in real time. This will further validate optimal network performance or help to evaluate possible design improvements.

Final note


In conclusion, Cisco MDS 9000 Series provides all necessary capabilities to contrast and eliminate the noisy neighbor syndrome at the storage network level. By combining proper network design with high-speed links, congestion avoidance techniques such as DIRL, slow drain analysis and SAN Insights, IT administrators can deliver an optimal data access solution on a shared network infrastructure.  And don’t regret if your network and storage utilization is not coming close to 100%. In a way, that would be your safeguard against the noisy neighbor syndrome.

Source: cisco.com