Showing posts with label Data Center. Show all posts
Showing posts with label Data Center. Show all posts

Thursday, 1 February 2024

Reimagine Your Data Center for Responsible AI Deployments

Reimagine Your Data Center for Responsible AI Deployments

Most days of the week, you can expect to see AI- and/or sustainability-related headlines in every major technology outlet. But finding a solution that is future ready with capacity, scale and flexibility needed for generative AI requirements and with sustainability in mind, well that’s scarce.

Cisco is evaluating the intersection of just that – sustainability and technology – to create a more sustainable AI infrastructure that addresses the implications of what generative AI will do to the amount of compute needed in our future world. Expanding on the challenges and opportunities in today’s AI/ML data center infrastructure, advancements in this area can be at odds with goals related to energy consumption and greenhouse gas (GHG) emissions.

Addressing this challenge entails an examination of multiple factors, including performance, power, cooling, space, and the impact on network infrastructure. There’s a lot to consider. The following list lays out some important issues and opportunities related to AI data center environments designed with sustainability in mind:

1. Performance Challenges: The use of Graphics Processing Units (GPUs) is essential for AI/ML training and inference, but it can pose challenges for data center IT infrastructure from power and cooling perspectives. As AI workloads require increasingly powerful GPUs, data centers often struggle to keep up with the demand for high-performance computing resources. Data center managers and developers, therefore, benefit from strategic deployment of GPUs to optimize their use and energy efficiency.

2. Power Constraints: AI/ML infrastructure is constrained primarily by compute and memory limits. The network plays a crucial role in connecting multiple processing elements, often sharding compute functions across various nodes. This places significant demands on power capacity and efficiency. Meeting stringent latency and throughput requirements while minimizing energy consumption is a complex task requiring innovative solutions.

3. Cooling Dilemma: Cooling is another critical aspect of managing energy consumption in AI/ML implementations. Traditional air-cooling methods can be inadequate in AI/ML data center deployments, and they can also be environmentally burdensome. Liquid cooling solutions offer a more efficient alternative, but they require careful integration into data center infrastructure. Liquid cooling reduces energy consumption as compared to the amount of energy required using forced air cooling of data centers.

4. Space Efficiency: As the demand for AI/ML compute resources continues to grow, there is a need for data center infrastructure that is both high-density and compact in its form factor. Designing with these considerations in mind can improve efficient space utilization and high throughput. Deploying infrastructure that maximizes cross-sectional link utilization across both compute and networking components is a particularly important consideration.

5. Investment Trends: Looking at broader industry trends, research from IDC predicts substantial growth in spending on AI software, hardware, and services. The projection indicates that this spending will reach $300 billion in 2026, a considerable increase from a projected $154 billion for the current year. This surge in AI investments has direct implications for data center operations, particularly in terms of accommodating the increased computational demands and aligning with ESG goals.

6. Network Implications: Ethernet is currently the dominant underpinning for AI for the majority of use cases that require cost economics, scale and ease of support. According to the Dell’Oro Group, by 2027, as much as 20% of all data center switch ports will be allocated to AI servers. This highlights the growing significance of AI workloads in data center networking. Furthermore, the challenge of integrating small form factor GPUs into data center infrastructure is a noteworthy concern from both a power and cooling perspective. It may require substantial modifications, such as the adoption of liquid cooling solutions and adjustments to power capacity.

7. Adopter Strategies: Early adopters of next-gen AI technologies have recognized that accommodating high-density AI workloads often necessitates the use of multisite or micro data centers. These smaller-scale data centers are designed to handle the intensive computational demands of AI applications. However, this approach places additional pressure on the network infrastructure, which must be high-performing and resilient to support the distributed nature of these data center deployments.

As a leader in designing and supplying the infrastructure for internet connectivity that carries the world’s internet traffic, Cisco is focused on accelerating the growth of AI and ML in data centers with efficient energy consumption, cooling, performance, and space efficiency in mind.

These challenges are intertwined with the growing investments in AI technologies and the implications for data center operations. Addressing sustainability goals while delivering the necessary computational capabilities for AI workloads requires innovative solutions, such as liquid cooling, and a strategic approach to network infrastructure.

The new Cisco AI Readiness Index shows that 97% of companies say the urgency to deploy AI-powered technologies has increased. To address the near-term demands, innovative solutions must address key themes — density, power, cooling, networking, compute, and acceleration/offload challenges.

We want to start a conversation with you about the development of resilient and more sustainable AI-centric data center environments – wherever you are on your sustainability journey. What are your biggest concerns and challenges for readiness to improve sustainability for AI data center solutions?

Source: cisco.com

Tuesday, 9 May 2023

Disaster Recovery Solutions for the Edge with HyperFlex and Cohesity

The edge computing architecture comes with a variety of benefits. Placement of compute, storage, and network resources close to the location at which data is being generated typically improves response times and may reduce WAN based network traffic between an Edge site and central data center. This stated the distributed nature of edge site architectures also introduces several challenges related to data protection and disaster recovery. One requirement is performing local backups with the ability to conduct local recovery operations. Another formidable challenge involves edge site disaster recovery. Planning for the inevitable edge site outage, be it temporary, elongated, or permanent is the problem this blog takes a deeper look into.

Cisco Career, Cisco Skills, Cisco Jobs, Cisco Tutorial and Materials, Cisco Certification, Cisco Guides, Cisco Learning

Business continuity planning focuses on items such as Recovery Point Objective (RPO) and Recovery Time Objective (RTO). These measurements are generally expressed in terms of a Service Level Agreement (SLA). Under the covers exists a collection of infrastructure building blocks that make adherence to an SLA possible. In simplistic terms, the building blocks include the ability to perform backups, the ability to create additional copies of backups, provide a methodology to transport backup copies to remote locations (replication), an intuitive management interface, and connects to a preconfigured recovery infrastructure.

From an operational standpoint, an edge site disaster recovery solution includes workflows that enable the ability to:

◉ Perform workload failover from an edge site to a central site.
◉ Protect failed over workload at a central site.
◉ Reverse replicate protected workloads from a central site back to an edge site at the point where the edge site is ready to receive inbound replication traffic.
◉ Failover again such that the edge site once again hosts production workloads.
◉ Test these operations without impacting production workloads.

Should an edge site failure or outage occur, workload failover to a disaster recovery site may become necessary. (Quite obviously, disaster recovery operations should be tested on an ongoing basis rather than just hoping things will work.) At the point where workload failover has been completed successfully, the failed over workload requires data protection. At the point where the edge site has been returned to an operational state, backup copies should be replicated back to the edge site. Alternatively, a new or different edge site may replace the original edge site. At some point, workload transition from the central site back to the edge site will occur.

HyperFlex with Cohesity Data Protect


Cohesity provides a number of DataProtect solutions to assist users in meeting data protection and disaster recovery business requirements. The Cohesity DataProtect product is available as a Virtual Edition and can be deployed as a single virtual machine hosted on a HyperFlex Edge cluster. A predefined small or large configuration is available for selection when the product is installed. The Cohesity DataProtect solution is also available in a ROBO Edition, running on a single Cisco UCS server.

Cohesity DataProtect edge solutions provide local protection of virtual machine workloads and can also replicate local backups to a larger centralized Cohesity cluster deployed on Cisco UCS servers.

Cisco Career, Cisco Skills, Cisco Jobs, Cisco Tutorial and Materials, Cisco Certification, Cisco Guides, Cisco Learning

Cohesity protection groups are configured and define the workloads to protect. Protection groups also include a policy that defines the frequency and retention period for local backups. The policy also defines a replication destination, replication frequency, as well as the retention period for replicated backups.

In summary, Cisco HyperFlex with Cohesity DataProtect has built-in workflows that enable easy workload failover and failover testing. At the point where reverse replication can be initiated, a simple policy modification is all that is required. Cohesity also features Helios, a centralized management facility that enables the entire solution to be managed from a single web-based console.

Source: cisco.com

Saturday, 29 April 2023

Cisco Nexus 9000 Intelligent Buffers in a VXLAN/EVPN Fabric

As customers migrate to network fabrics based on Virtual Extensible Local Area Network/Ethernet Virtual Private Network (VXLAN/EVPN) technology, questions about the implications for application performance, Quality of Service (QoS) mechanisms, and congestion avoidance often arise. This blog post addresses some of the common areas of confusion and concern, and touches on a few best practices for maximizing the value of using Cisco Nexus 9000 switches for Data Center fabric deployments by leveraging the available Intelligent Buffering capabilities.

What Is the Intelligent Buffering Capability in Nexus 9000?


Cisco Nexus 9000 series switches implement an egress-buffered shared-memory architecture, as shown in Figure 1. Each physical interface has 8 user-configurable output queues that contend for shared buffer capacity when congestion occurs. A buffer admission algorithm called Dynamic Buffer Protection (DBP), enabled by default, ensures fair access to the available buffer among any congested queues.

Cisco Nexus 9000, Cisco Career, Cisco Prep, Cisco Tutorial and Materials, Cisco Jobs, Cisco Learning, Cisco Certification
Figure 1 – Simplified Shared-Memory Egress Buffered Switch
 
In addition to DBP, two key features – Approximate Fair Drop (AFD) and Dynamic Packet Prioritization (DPP) – help to speed initial flow establishment, reduce flow-completion time, avoid congestion buildup, and maintain buffer headroom for absorbing microbursts.

AFD uses in-built hardware capabilities to separate individual 5-tuple flows into two categories – elephant flows and mouse flows:

◉ Elephant flows are longer-lived, sustained bandwidth flows that can benefit from congestion control signals such as Explicit Congestion Notification (ECN) Congestion Experienced (CE) marking, or random discards, that influence the windowing behavior of Transmission Control Protocol (TCP) stacks. The TCP windowing mechanism controls the transmission rate of TCP sessions, backing off the transmission rate when ECN CE markings, or un-acknowledged sequence numbers, are observed (see the “More Information” section for additional details).

◉ Mouse flows are shorter-lived flows that are unlikely to benefit from TCP congestion control mechanisms. These flows consist of the initial TCP 3-way handshake that establishes the session, along with a relatively small number of additional packets, and are subsequently terminated. By the time any congestion control is signaled for the flow, the flow is already complete.

As shown in Figure 2, with AFD, elephant flows are further characterized according to their relative bandwidth utilization – a high-bandwidth elephant flow has a higher probability of experiencing ECN CE marking, or discards, than a lower-bandwidth elephant flow. A mouse flow has a zero probability of being marked or discarded by AFD.

Cisco Nexus 9000, Cisco Career, Cisco Prep, Cisco Tutorial and Materials, Cisco Jobs, Cisco Learning, Cisco Certification
Figure 2 – AFD with Elephant and Mouse Flows

For readers familiar with the older Weighted Random Early Detect (WRED) mechanism, you can think of AFD as a kind of “bandwidth-aware WRED.” With WRED, any packet (regardless of whether it’s part of a mouse flow or an elephant flow) is potentially subject to marking or discards. In contrast, with AFD, only packets belonging to sustained-bandwidth elephant flows may be marked or discarded – with higher-bandwidth elephants more likely to be impacted than lower-bandwidth elephants – while a mouse flow is never impacted by these mechanisms.

Additionally, AFD marking or discard probability for elephants increases as the queue becomes more congested. This behavior ensures that TCP stacks back off well before all the available buffer is consumed, avoiding further congestion and ensuring that abundant buffer headroom still remains to absorb instantaneous bursts of back-to-back packets on previously uncongested queues.

DPP, another hardware-based capability, promotes the initial packets in a newly observed flow to a higher priority queue than it would have traversed “naturally.” Take for example a new TCP session establishment, consisting of the TCP 3-way handshake. If any of these packets sit in a congested queue, and therefore experience additional delay, it can materially affect application performance.

As shown in Figure 3, instead of enqueuing those packets in their originally assigned queue, where congestion is potentially more likely, DPP will promote those initial packets to a higher-priority queue – a strict priority (SP) queue, or simply a higher-weighted Deficit Weighted Round-Robin (DWRR) queue – which results in expedited packet delivery with a very low chance of congestion.

Cisco Nexus 9000, Cisco Career, Cisco Prep, Cisco Tutorial and Materials, Cisco Jobs, Cisco Learning, Cisco Certification
Figure 3 – Dynamic Packet Prioritization (DPP)

If the flow continues beyond a configurable number of packets, packets are no longer promoted – subsequent packets in the flow traverse the originally assigned queue. Meanwhile, other newly observed flows would be promoted and enjoy the benefit of faster session establishment and flow completion for short-lived flows.

AFD and UDP Traffic


One frequently asked question about AFD is if it’s appropriate to use it with User Datagram Protocol (UDP) traffic. AFD by itself does not distinguish between different protocol types, it only determines if a given 5-tuple flow is an elephant or not. We generally state that AFD should not be enabled on queues that carry non-TCP traffic. That’s an oversimplification, of course – for example, a low-bandwidth UDP application would never be subject to AFD marking or discards because it would never be flagged as an elephant flow in the first place.

Recall that AFD can either mark traffic with ECN, or it can discard traffic. With ECN marking, collateral damage to a UDP-enabled application is unlikely. If ECN CE is marked, either the application is ECN-aware and would adjust its transmission rate, or it would ignore the marking completely. That said, AFD with ECN marking won’t help much with congestion avoidance if the UDP-based application is not ECN-aware.

On the other hand, if you configure AFD in discard mode, sustained-bandwidth UDP applications may suffer performance issues. UDP doesn’t have any inbuilt congestion-management mechanisms – discarded packets would simply never be delivered and would not be retransmitted, at least not based on any UDP mechanism. Because AFD is configurable on a per-queue basis, it’s better in this case to simply classify traffic by protocol, and ensure that traffic from high-bandwidth UDP-based applications always uses a non-AFD-enabled queue.

What Is a VXLAN/EVPN Fabric?


VXLAN/EVPN is one of the fastest growing Data Center fabric technologies in recent memory. VXLAN/EVPN consists of two key elements: the data-plane encapsulation, VXLAN; and the control-plane protocol, EVPN.

You can find abundant details and discussions of these technologies on cisco.com, as well as from many other sources. While an in-depth discussion is outside the scope of this blog post, when talking about QOS and congestion management in the context of a VXLAN/EVPN fabric, the data-plane encapsulation is the focus. Figure 4 illustratates the VXLAN data-plane encapsulation, with emphasis on the inner and outer DSCP/ECN fields.

Cisco Nexus 9000, Cisco Career, Cisco Prep, Cisco Tutorial and Materials, Cisco Jobs, Cisco Learning, Cisco Certification
Figure 4 – VXLAN Encapsulation

As you can see, VXLAN encapsulates overlay packets in IP/UDP/VXLAN “outer” headers. Both the inner and outer headers contain the DSCP and ECN fields.

With VXLAN, a Cisco Nexus 9000 switch serving as an ingress VXLAN tunnel endpoint (VTEP) takes a packet originated by an overlay workload, encapsulates it in VXLAN, and forwards it into the fabric. In the process, the switch copies the inner packet’s DSCP and ECN values to the outer headers when performing encapsulation.

Transit devices such as fabric spines forward the packet based on the outer headers to reach the egress VTEP, which decapsulates the packet and transmits it unencapsulated to the final destination. By default, both the DSCP and ECN fields are copied from the outer IP header into the inner (now decapsulated) IP header.

In the process of traversing the fabric, overlay traffic may pass through multiple switches, each enforcing QOS and queuing policies defined by the network administrator. These policies might simply be default configurations, or they may consist of more complex policies such as classifying different applications or traffic types, assigning them to unique classes, and controlling the scheduling and congestion management behavior for each class.

How Do the Intelligent Buffer Capabilities Work in a VXLAN Fabric?


Given that the VXLAN data-plane is an encapsulation, packets traversing fabric switches consist of the original TCP, UDP, or other protocol packet inside a IP/UDP/VXLAN wrapper. Which leads to the question: how do the Intelligent Buffer mechanisms behave with such traffic?

As discussed earlier, sustained-bandwidth UDP applications could potentially suffer from performance issues if traversing an AFD-enabled queue. However, we should make a very key distinction here – VXLAN is not a “native” UDP application, but rather a UDP-based tunnel encapsulation. While there is no congestion awareness at the tunnel level, the original tunneled packets can carry any kind of application traffic –TCP, UDP, or virtually any other protocol.

Thus, for a TCP-based overlay application, if AFD either marks or discards a VXLAN-encapsulated packet, the original TCP stack still receives ECN marked packets or misses a TCP sequence number, and these mechanisms will cause TCP to reduce the transmission rate. In other words, the original goal is still achieved – congestion is avoided by causing the applications to reduce their rate.

Similarly, high-bandwidth UDP-based overlay applications would respond just as they would to AFD marking or discards in a non-VXLAN environment. If you have high-bandwidth UDP-based applications, we recommend classifying based on protocol and ensuring those applications get assigned to non-AFD-enabled queues.

As for DPP, while TCP-based overlay applications will benefit most, especially for initial flow-setup, UDP-based overlay applications can benefit as well. With DPP, both TCP and UDP short-lived flows are promoted to a higher priority queue, speeding flow-completion time. Therefore, enabling DPP on any queue, even those carrying UDP traffic, should provide a positive impact.

Key Takeaways


VXLAN/EVPN fabric designs have gained significant traction in recent years, and ensuring excellent application performance is paramount. Cisco Nexus 9000 Series switches, with their hardware-based Intelligent Buffering capabilities, ensure that even in an overlay application environment, you can maximize the efficient utilization of available buffer, minimize network congestion, speed flow-establishment and flow-completion times, and avoid drops due to microbursts.

Source: cisco.com

Tuesday, 7 March 2023

ACI Segmentation and Migrations made easier with Endpoint Security Groups (ESG)

Let’s open with a question: “How are you handling security and segmentation requirements in your Cisco Application Centric Infrastructure (ACI) fabric?”

I expect most answers will relate to constructs of Endpoint Groups (EPGs), contracts and filters.  These concepts are the foundations of ACI. But when it comes to any infrastructure capabilities, designs and customers’ requirements are constantly evolving, often leading to new segmentation challenges. That is why I would like to introduce a relatively recent, powerful option called Endpoint Security Groups (ESGs). Although ESGs were introduced in Cisco ACI a while back (version 5.0(1) released in May 2020), there is still ample opportunity to spread this functionality to a broader audience.

For those who have not explored the topic yet, ESGs offer an alternate way of handling segmentation with the added flexibility of decoupling this from the earlier concepts of forwarding and security associated with Endpoint Groups. This is to say that ESGs handle segmentation separately from the forwarding aspects, allowing more flexibility and possibility with each.

EPG and ESG – Highlights and Differences


The easiest way to manage endpoints with common security requirements is to put them into groups and control communication between them. In ACI, these groups have been traditionally represented by EPGs. Contracts that are attached to EPGs are used for controlling communication and other policies between groups with different postures. Although EPG has been primarily providing network security, it must be married to a single bridge domain. This is because EPGs define both forwarding policy and security segmentation simultaneously. This direct relationship between Bridge Domain (BD) and an EPG prevents the possibility of an EPG to span more than one bridge domain. This design requirement can be alleviated by ESGs. With ESGs, networking (i.e., forwarding policy) happens on the EPG/BD level, and security enforcement is moved to the ESG level.

Operationally, the ESG concept is similar to, and more straightforward than the original EPG approach. Just like EPGs, communication is allowed among any endpoints within the same group, but in the case of ESGs, this is independent of the subnet or BD they are associated with. For communication between different ESGs, we need contracts. That sounds familiar, doesn’t it? ESGs use the same contract constructs we have been using in ACI since inception.

So, what are the benefits of ESGs then? In a nutshell, where EPGs are bound to a single BD, ESGs allow you to define a security policy that spans across multiple BDs. This is to say you can group and apply policy to any number of endpoints across any number of BDs under a given VRF.  At the same time, ESGs decouple the forwarding policy, which allows you to do things like VRF route leaking in a much more simple and more intuitive manner.

ESG. A Simple Use Case Example


To give an example of where ESGs could be useful, consider a brownfield ACI deployment that has been in operation for years. Over time things tend to grow organically. You might find you have created more and more EPG/BD combinations but later realize that many of these EPGs actually share the same security profile. With EPGs, you would be deploying and consuming more contract resources to achieve what you want, plus potentially adding to your management burden with more objects to keep an eye on. With ESGs, you can now simply group all these brownfield EPGs and their endpoints and apply the common security policies only once. What is important is you can do this without changing anything having to do with IP addressing or BD settings they are using to communicate.

So how do I assign an endpoint to an ESG? You do this with a series of matching criteria. In the first release of ESGs, you were limited in the kinds of matching criteria. Starting from ACI 5.2(1), we have expanded matching criteria to provide more flexibility for endpoint classification and ease for the user. Among them: Tag Selectors (based on MAC, IP, VM tag, subnet), whole EPG Selectors, and IP Subnet Selectors. All the details about different selectors can be found here: https://www.cisco.com/c/en/us/td/docs/dcn/aci/apic/6x/security-configuration/cisco-apic-security-configuration-guide-60x/endpoint-security-groups-60x.html.

EPG to ESG Migration Simplified


In case where your infrastructure is diligently segmented with EPGs and contracts that reflect application tiers’ dependencies, ESGs are designed to allow you to migrate your policy with just a little effort.

The first question that most probably comes to your mind is how to achieve that? With the EPG Selector, one of the new methods of classifying endpoints into ESGs, we enable a seamless migration to the new grouping concept by inheriting contracts from the EPG level. This is an easy way to quickly move all your endpoints within one or more EPGs into your new ESGs.

For a better understanding, let’s evaluate the below example. See Figure 1. We have a simple two EPGs setup that we will migrate to ESGs. Currently, the communication between them is achieved with contract Ctr-1.

High-level migration steps are as follows:

1. Migrate EPG 1 to ESG 1
2. Migrate EPG 2 to ESG 2
3. Replace the existing contract with the one applied between newly created ESGs.

Endpoint Security Groups (ESG), Cisco Career, Cisco Skills, Cisco Jobs, Cisco Tutorial and Materials, Cisco Prep, Cisco Preparation
Figure-1- Two EPGs with the contract in place

The first step is to create a new ESG 1 where EPG 1 is matched using the EPG Selector. It means that all endpoints that belong to this EPG become part of a newly created ESG all at once. These endpoints still communicate with the other EPG(s) because of an automatic contract inheritance (Note: You cannot configure an explicit contract between ESG and EPG).

This state, depicted in Figure 2, is considered as an intermediate step of a migration, which the APIC reports with F3602 fault until you migrate outstanding EPG(s) and contracts. This fault is a way for us to encourage you to continue with a migration process so that all security configurations are maintained by ESGs. This will keep the configuration and design simple and maintainable. However, you do not have to do it all at once. You can progress according to your project schedule.

Endpoint Security Groups (ESG), Cisco Career, Cisco Skills, Cisco Jobs, Cisco Tutorial and Materials, Cisco Prep, Cisco Preparation
Figure 2 – Interim migration step

As a next step, with EPG Selector, you migrate EPG 2 to ESG 2, respectively. Keep in mind that nothing stands in the way of placing other EPGs into the same ESG (even if these EPGs refer to different BDs). Communication between ESGs is still allowed with contract inheritance.

To complete the migration, as a final step, configure a new contract with the same filters as the original one – Ctr-1-1. Assign one ESG as a provider and the second as a consumer, which takes precedence over contract inheritance. Finally, remove the original Ctr-1 contract between EPG 1 and EPG 2. This step is shown in Figure 3.

Endpoint Security Groups (ESG), Cisco Career, Cisco Skills, Cisco Jobs, Cisco Tutorial and Materials, Cisco Prep, Cisco Preparation
Figure 3 – Final setup with ESGs and new contract

Easy Migration to ACI


The previous example is mainly applicable when segmentation at the EPG level is already applied according to the application dependencies. However, not everyone may realize that ESG also simplifies brownfield migrations from existing environments to Cisco ACI.

A starting point for many new ACI customers is how EPG designs are implemented.  Typically, the most common choice is to implement such that one subnet is mapped to one BD and one EPG to reflect old VLAN-based segmentation designs (Figure 4). So far, moving from such a state to a more application-oriented approach where an application is broken up into tiers based on function has not been trivial. It has often been associated with the need to transfer some workloads between EPGs, or re-addressing servers/services, which typically leads to disruptions.

Endpoint Security Groups (ESG), Cisco Career, Cisco Skills, Cisco Jobs, Cisco Tutorial and Materials, Cisco Prep, Cisco Preparation
Figure 4 – EPG = BD segmentation design

Introducing application-level segmentation in such a deployment model is challenging unless you use ESGs. So how do I make this migration from pure EPG to using ESG? With the new selectors available, you can start very broadly and then, when ready, begin to define additional detail and policy. It is a multi-stage process that still allows endpoints to communicate without disruption as we make the transition gracefully. In general, the steps of this process can be defined as follows:

1. Classify all endpoints into one “catch-all” ESG
2. Define new segmentation groups and seamlessly take out endpoints from “catch-all” ESG to newly created ESGs.
3. Continue until all endpoints are assigned to new security groups.

In the first step (Figure 5), you can enable free communication between EPGs, by classifying all of them using EPG selectors and putting them (temporarily) into one “catch-all” ESG. This is conceptually similar to any “permit-all” solutions you may have used prior to ESGs (e.g. vzAny, Preferred Groups).

Endpoint Security Groups (ESG), Cisco Career, Cisco Skills, Cisco Jobs, Cisco Tutorial and Materials, Cisco Prep, Cisco Preparation
Figure 5 – All EPGs are temporarily put into one ESG

In the second step (Figure 6), you can begin to shape and refine your security policy by seamlessly taking out endpoints from the catch-all ESG and putting them into other newly created ESGs that meet your security policy and desired outcome. For that, you can use other endpoint selector methods available – in this example – tag selectors. Keep in mind that there is no need to change any networking constructs related to these endpoints. VLAN binding to interfaces with EPGs remains the same. No need for re-addressing or moving between BDs or EPGs.

Endpoint Security Groups (ESG), Cisco Career, Cisco Skills, Cisco Jobs, Cisco Tutorial and Materials, Cisco Prep, Cisco Preparation
Figure 6 – Gradual migration from an existing network to Cisco ACI

As you continue to refine your security policies, you will end up in a state where all of your endpoints are now using the ESG model. As your data center fabric grows, you do not have to spend any time worrying about which EPG or which BD subnet is needed because ESG frees you of that tight coupling. In addition, you will gain detailed visibility into endpoints that are part of an ESG that represent a department (like IT or Sales in the above example) or application suite. This makes management, auditing, and other operational aspects easier.

Intuitive route-leaking


It is well understood that getting Cisco ACI to interconnect two VRFs in the same or different tenants is possible without any external router. However, two additional aspects must be ensured for this type of communication to happen. First is regular routing reachability and the second is security permission.

In this very blog, I stated that ESG decouples forwarding from security policy. This is also clearly visible when you need to configure inter-VRF connectivity. Refer to Figure 7 for high-level, intuitive configuration steps.

Endpoint Security Groups (ESG), Cisco Career, Cisco Skills, Cisco Jobs, Cisco Tutorial and Materials, Cisco Prep, Cisco Preparation
Figure 7. Simplified route-leaking configuration. Only one direction is shown for better readability

At the VRF level, configure the subnet to be leaked and its destined VRF to establish routing reachability. A leaked subnet must be equal to or be a subset of a BD subnet. Next attach a contract between the ESGs in different VRFs to allow desired communication to happen. Finally, you can put aside the need to configure subnets under the provider EPG (instead of under the BD only), and make adjustments to define the correct BD scope. These are not required anymore. The end result is a much easier way to set up route leaking with none of the sometimes confusing and cumbersome steps that were necessary using the traditional EPG approach.

Source: cisco.com

Tuesday, 13 December 2022

EVPN Myth Buster Series – To lead or follow, where does Cisco stand?

Innovation is one of many characteristics defining industry leadership. Becoming an industry leader requires a company to innovate and find new ways to solve customer problems. Industry leadership is not something that can be created overnight. It takes more than a software release or a list of supported features to cement yourself as a technology thought leader in the networking industry. Over the past several decades, Cisco has demonstrated its leadership in networking by leading technology innovations that meet and exceed customers’ needs and then enabling the adoption of these technologies across organizations driving standardization. This has also led to robust collaborations and interoperability among different vendors for the benefit of the industry as a whole – please refer to Figure 1 Number of RFCs authors per affiliation for the top 30 companies at IETF over the past three decades.

Some examples of such technologies are L3VPN, MPLS, and EVPN. Cisco innovators such as Eric Rosen, Yakov Rekhter, and George Swallow incubated MPLS and L3VPN technologies and then led the standardization effort at the IETF. Furthermore, Cisco innovator, Ali Sajassi incubated EVPN technology and then led the standardization effort at the IETF. A well-adopted standard is like a team sport, and it requires not only participation but also contributions from every member of the team – i.e., from every vendor and provider involved.

Cisco Certification, Cisco Career, Cisco Skills, Cisco Jobs, Cisco Tutorial and Materials, Cisco Prep, Cisco Prepartion
Figure 1. Number of RFC authors per affiliation for the top 30 companies at IETF over the past three decades

In the past decade, Cisco has introduced EVPN technology to the networking industry at large and has led the standardization efforts for all the initial RFCs for this technology with the help of several vendors and service providers. Although there are vendors who are true partners and contributors to this technology and its standardization, there are “others” who are neither participants nor contributors but just users of it. One can easily find out who is who by looking at the IETF statistics for EVPN.

These “other” vendors have made claims to be pioneers of cloud networking fabrics driving a standards-based approach, even though they are openly adopting and implementing Cisco-authored RFCs and drafts into their software. These “other” vendors attempt to create the perception of open standards as their core pillar. Cisco has been a long-time innovator with a proven track record of developing IETF drafts to facilitate the implementation of new technologies that are widely adopted by these other vendors (“others”) in the networking industry. Being an industry leader requires Cisco to continue evolving and driving standards to make networks work better – please refer to Figure 2. Chart showing the number of EVPN RFC Primary Authorship, EVPN RFC Authorship, and Working-Group Authorship Affiliation:

Cisco Certification, Cisco Career, Cisco Skills, Cisco Jobs, Cisco Tutorial and Materials, Cisco Prep, Cisco Prepartion
Figure 2. Number of EVPN RFC Primary Authorship, EVPN RFC Authorship, and Working-Group Authorship Affiliation

IETF


For most of us, it is widely known the IETF is the premier Internet standards organization. Citing the IETF Standards page:

Cisco Certification, Cisco Career, Cisco Skills, Cisco Jobs, Cisco Tutorial and Materials, Cisco Prep, Cisco Prepartion
“Improving existing standards and creating, implementing, and deploying new standards is an ongoing effort. The IETF’s mission is to produce high-quality, relevant technical documents that describe these voluntary standards. IETF working groups are the primary mechanism for the development of IETF specifications and guidelines.”
As EVPN-VXLAN becomes the de facto standard for IP fabrics, Cisco continues to enhance and publish IETF drafts based on the protocols and architectures addressing new requirements and use cases. When Cisco develops the standards and drafts, there is an implementation in mind for the system and its parts, while “others” will choose to follow and implement the RFCs and the drafts without a full understanding of the use cases.

These other vendors will create and leverage feature matrices to fill their gaps and respond to RFPs, citing our documents and acting as if they would know better. Cisco can confidently claim to lead while “others” only follow, while Cisco invents and “others” only adopt.

Cisco Certification, Cisco Career, Cisco Skills, Cisco Jobs, Cisco Tutorial and Materials, Cisco Prep, Cisco Prepartion
Figure 3. VXLAN EVPN Industry Contribution

Cisco continues to extend its leadership in promoting open standards, interoperability, and multi-vendor solutions for Cloud Networking technologies.

This series of blogs aimed to provide a deeper understanding of EVPN VXLAN and additions to the IETF drafts implemented for today’s customer deployments.

History of Ethernet VPN (EVPN)


For many years, the need for extending Layer-2 efficiently was a burdensome task. Before the availability of Layer-2 VPNs, sorts of LANE (LAN Emulation) were used to transport Ethernet across distances, or we just plugged two Ethernet domains together via CWDM or DWDM. All these approaches had their pros and cons, but some common challenges remained, the virtualization of the Layer-2 service across a common infrastructure. When MPLS-based Layer-2 VPN rose to prominence, the presence of true Layer-2 VPNs became available, and with this the better use of the underlying transport infrastructure. With VPLS (Virtual Private LAN Service) multipoint-to-multipoint Layer-2 VPNs became affordable and addressed many new use cases. Even though VPLS brought many advantages, the pseudo-wire maintenance, transport dependency, and lack of comprehensive embedded access node redundancy still made it challenging to deploy. While all of this was the truth over a decade ago, around 2012 we embarked on a new chapter of Layer-2 VPNs with the advent of Ethernet VPN in short EVPN. In its essence, EVPN addressed the challenges the more traditional L2VPNs incurred and innovated new schemes in layer-2 address learning to become one of the most successful VPN technologies.

The journey of EVPN as a standard started back in 2010 when Ali Sajassi introduced and presented the very first draft of EVPN (initially called Routed VPLS, draft-sajassi-l2vpn-rvpls-bgp-00.txt, to IETF (Internet Engineering Task Force) in March of 2010. This draft was later merged with another draft by Rahul Aggarwal (from Juniper), draft-raggarwa-mac-vpn-00.txt, because of their synergy, and a new draft was born in October 2010 –  draft-raggarwa-sajassi-l2vpn-evpn-00.txt. This draft became a working group document in February 2012 and became a standard RFC 7432 in February 2015. This is the defacto base RFC for the basic EVPN behavior and its modes and subsequent EVPN RFC builds on top of the groundwork of this RFC.

Around the same time as the main EVPN draft introduction, Cisco introduced other EVPN related drafts such as draft-sajassi-raggarwa-l2vpn-evpn-req-00.txt and draft-sajassi-l2vpn-pbb-evpn-00.txt in October 2010 and March 2011 respectively which became standard in February 2014 and September 2015 respectively.

After the publications of initial EVPN drafts that later became RFCs 7432, 7209, and 7623, in 2013, Cisco published another set of EVPN drafts for Virtualization/VxLAN and for inter-subnet forwarding (L2 and L3 forwarding) that gave EVPN its versatility as it stands today. These drafts later became the standard RFCs 8365 and 9135.

Cisco Certification, Cisco Career, Cisco Skills, Cisco Jobs, Cisco Tutorial and Materials, Cisco Prep, Cisco Prepartion
Figure 4. IETF EVPN Timeline

With the based EVPN using MPLS encapsulation celebrated its success in the Service Provider market, for the Data Center an IP-based encapsulation was more suitable. With this, in 2013 the EVPN draft for “overlays” (draft-sd-l2vpn-evpn-overlay) was published, which included the encapsulation of VXLAN and became RFC 8365 in 2018. In order to address the various use cases for the Data Center, a couple of related drafts were filed around the same time. The definition of how to do inter-subnet routing (draft-sajassi-l2vpn-evpn-inter-subnet-forwarding), how we advertise a IP Prefix route in EVPN (draft-rabadan-l2vpn-evpn-prefix-advertisement) or how to interconnect multiple EVPN “overlay” domains with a Data Center Interconnect (draft-rabadan-l2vpn-dci-evpn-overlay). All these drafts from 2013 now being RFCs and define the standard in how EVPN is being used within and between Data Centers.

Cisco Certification, Cisco Career, Cisco Skills, Cisco Jobs, Cisco Tutorial and Materials, Cisco Prep, Cisco Prepartion
Figure 5. EVPN RFC for VXLAN and DCI

The realms of standards are often a cabala. Opening this up and sharing some of the histories with the most significant milestones is as important as defining the standards themselves. For more than a decade, Cisco has actively driven the standardization of EVPN and shared this innovation with the networking industry. With over 50 publications to the IETF, Cisco leads the EVPN standardization and is proud of the collaboration with its partnering authors. With the proliferation of EVPN across all of Cisco’s Operating Systems (IOS-XE, IOS-XR, NX-OS) being fully interoperable, the flexibility of the right operational model across deployments in Campus, WAN, Data Center, or Service Provider domains is unmatched.

Source: cisco.com

Wednesday, 19 October 2022

Cisco Nexus: Connect cloud-scale performance and sustainability

Announcing the first Cisco 800G Nexus Switch


It’s the week of the Open Compute Project Global Summit a conference that attracts the biggest names representing cloud providers, colo-facilities, enterprises, telco service providers, media, and government entities, a group who build and operate high performance infrastructure. Our customers are here in force and we launched our blueprint for helping cloud service providers (both hyper scalers and webscale customers) deliver richer cloud applications and services, while balancing their needs for higher performance, cost effective, yet more efficient, hence more sustainable networking infrastructure.

Cisco Nexus, Cisco Prep, Cisco Tutorial and Material, Cisco Prep, Cisco Certification, Cisco Prep, Cisco Preparation

For our Cloud Networking customers, 2022 turns out to be a block buster year. It was only in June that I wrote about exciting new 400G Nexus platforms. All of these are now shipping. This week I am proud to announce the addition of the first 800G Nexus product to the rich Nexus portfolio. The Nexus 9232E is a 1RU Nexus switch with 32ports of 800G.

Cisco Nexus, Cisco Prep, Cisco Tutorial and Material, Cisco Prep, Cisco Certification, Cisco Prep, Cisco Preparation

So, what use case needs an 800G switch? I don’t think there is a dispute that the pace and scale of data center networking buildouts is accelerating. Two fun facts:

1. In 2011, the first year of the OCP event, the total volume of data created and stored in the world was just under two zettabytes. In 2022, that’s expected to grow to nearly 100 zettabytes.

2. Similarly, the number of users (by MAU or monthly active users) for Meta services, who was one of the founding members of OCP, grew from 845 million in 2011 to nearly 3 billion  today.

What is becoming clearer are the many use cases for AI/ML for both network operations team as well as application teams. AI/ML capabilities are crucial for digital twin type predictability of network change results as well as for modeling highly customized application outcomes in the real and meta world. Indeed, by 2025, 44 percent of global data created in the core and edge will be driven by analytics, artificial intelligence, and deep learning, and by an increasing number of IoT devices feeding data to the enterprise edge.

It is no wonder that our customers are actively looking for innovative solutions addressing their key questions:

◉ How do I handle traffic/ data growth while dealing with an increasingly challenging power/ power cost environment?

◉ Can I continue to scale this infrastructure but do so in a sensible, sustainable way?

◉ Can network bandwidth be utilized more efficiently while still supporting current cloud network deployments?

Cisco’s cloud networking difference


We are hard at work in solving these customer challenges. 800G technology is one of many steps on a journey.  Our customers want positive outcomes in the areas of experiences, economics, and environment with respect to their hybrid cloud network infrastructure. So, our new Nexus 800G product will deliver the same benefits that Nexus cloud networking customers already enjoy including:

◉ High Performance: Massive throughput with Silicon One 25.6T G100 ASIC and smart system design that will power the next generation of network innovations and breakthroughs.

◉ Flexibility and Agility: Choices in network operating systems, speed, form factors, optics to meet virtually any use case.

◉ Programmability: Enabled through open APIs and protocol support in our cloud network optimized OS.

◉ Density and Scale: For both fixed and modular systems offering scalability from 100G to 400G to now 800G.

◉ Energy Efficiency: Significantly improved power per bit leveraging 112G Serdes technology.

◉ Simplicity: Manageability, optics backward compatibility enabling less equipment needed to scale higher.

Cisco Nexus, Cisco Prep, Cisco Tutorial and Material, Cisco Prep, Cisco Certification, Cisco Prep, Cisco Preparation

Source: cisco.com

Tuesday, 27 September 2022

Cisco MDS 9000 FSPF Link Cost Multiplier: Enhancing Path Selection on High-Speed Storage Networks

Cisco MDS, Cisco, Cisco Prep, Cisco Preparation, Cisco Career, Cisco Skills, Cisco Jobs, Cisco FSPF, Cisco Certification, Cisco Tutorial and Materials, Cisco News

The need for optimal path selection


When embarking on a journey from one location to another one, we always try to determine the best route to follow. This happens according to some preference criteria. Having a single route is the simplest situation but that would lead to long delays in case of any problem occurring along the way. Availability of multiple paths to destination is good in terms of reliability. Of course, we would need an optimal path selection tool and clear street signs, or a navigation system with GPS, to avoid loops or getting lost.

In a similar way, data networks are designed with multiple paths to destination for higher availability. Specific protocols enable optimal path selection and loop avoidance. Ethernet networks have used the Spanning Tree Protocol or more recent standards like TRILL. IP networks rely on routing protocols like BGP, OSPF and RIP to determine the best end-to-end path. Fibre Channel fabrics have their own standard routing protocol, called Fabric Shortest Path First (FSPF), defined by INCITS T11 FC-SW-8.

FSPF on Cisco MDS switches


The FSPF protocol is enabled by default on all Cisco Fibre Channel switches. Normally you do not need to configure any FSPF parameters. FSPF automatically calculates the best path between any two switches in a fabric. It can also select an alternative path in the event of the failure of a given link. FSPF regulates traffic routing no matter how complex the fabric might be, including dual datacenter core-edge designs.

Cisco MDS, Cisco, Cisco Prep, Cisco Preparation, Cisco Career, Cisco Skills, Cisco Jobs, Cisco FSPF, Cisco Certification, Cisco Tutorial and Materials, Cisco News

FSPF supports multipath routing and bases path status on a link state protocol called Shortest Path First. It runs on E ports or TE ports, providing a loop free topology. Routing happens hop by hop, based only on the destination domain ID. FSPF uses a topology database to keep track of the state of the links on all switches in the fabric and associates a cost with each link. It makes use of Dijkstra algorithm and guarantees a fast reconvergence time in case of a topology change. Every VSAN runs its own FSPF instance. By combining VSAN and FSPF technologies, traffic engineering can be achieved on a fabric. One use case would be to force traffic for a VSAN on a specific ISL. Also, the use of PortChannels instead of individual ISLs makes the implementation very efficient as fewer FSPF calculations are required.

FSPF link cost calculation


FSPF protocol uses link costs to determine the shortest path in a fabric between a source switch and a destination switch. The protocol tracks the state of links on all switches in the fabric and associates a cost with each link in its database. Also, FSPF determines path cost by adding the costs of all the ISLs in each path. Finally, FSPF compares the cost of various paths and chooses the path with minimum cost. If multiple paths exist with the same minimum cost, FSPF distributes the load among them.

You can administratively set the cost associated with an ISL link as an integer value from 1 to 30000. However, this operation is not necessary and typically FSPF will use its own default mechanism for associating a cost to all links. This is specified within INCITS T11 FC-SW-8 standard. Essentially, the link cost is calculated based on the speed of the link times an administrative multiplier factor. By default, the value of this multiplier is S=1. Practically the link cost is inversely proportional to its bandwidth. Hence the default cost for 1 Gbps links is 1000, for 2 Gbps is 500, for 4 Gbps is 250, for 32 Gbps is 31 and so on.

FSPF link cost calculation challenges


It is easy to realize that  high-speed links introduce some challenges because the link cost computes smaller and smaller. This becomes a significant issue when the total link bandwidth is over 128 Gbps. For these high-speed links, the default link costs become too similar to one another and so leading to inefficiencies.

The situation gets even worse for logical links. FSPF treats PortChannels as a single logical link between two switches. On Cisco MDS 9000 series, a PortChannel can have a maximum of 16 member links. With multiple physical links combined into a PortChannel, the aggregate bandwidth scales upward and the logical link cost reduces accordingly. Consequently, different paths may appear to have the same cost although they have a different member count and different bandwidths. Path inefficiencies may occur when PortChannels with as low as 9 x 16 Gbps members are present. This leads to poor path selection by FSPF. For example, imagine two alternative paths to same destination, one traversing a 9x16G PortChannel and one traversing a 10x16G PortChannel. Despite the two PortChannels have a different aggregate bandwidth, their link cost would compute to the same value.

FSPF link cost multiplier feature


To address the challenge, for now and the future, Cisco MDS NX-OS 9.3(1) release introduced the FSPF link cost multiplier feature. This new feature should be configured when parallel paths above the 128 Gbps threshold exist in a fabric. By doing so, FSPF can properly distinguish higher bandwidth links from one another and is able to select the best path.

All switches in a fabric must use the same FSPF link cost multiplier value. This way they all use the same basis for path cost calculations. This feature automatically distributes the configured FSPF link cost multiplier to all Cisco MDS switches in the fabric with Cisco NX-OS versions that support the feature. If any switches are present in the fabric that do not support the feature, then the configuration fails and is not applied to any switches. After all switches accept the new FSPF link cost multiplier value, a delay of 20 seconds occurs before being applied. This ensures that all switches apply the update simultaneously.

The new FSPF link cost multiplier value is S=20, as opposed to 1 in the traditional implementation. With a simple change to one parameter, Cisco implementation keeps using the same standard based formula as before. With the new value for this parameter, the FSPF link cost computation will stay optimal even for PortChannels with 16 members of up to 128 Gbps speed.

Cisco MDS, Cisco, Cisco Prep, Cisco Preparation, Cisco Career, Cisco Skills, Cisco Jobs, Cisco FSPF, Cisco Certification, Cisco Tutorial and Materials, Cisco News

Source: cisco.com

Tuesday, 7 June 2022

Implementing Infrastructure as Code- How NDFC works with Ansible and Terraform

Automation has been the focus of interest in the industry for quite some time now. Out of the top tools available, Ansible and Terraform have been popularly used amongst automation enthusiasts like me. While Ansible and Terraform are different in their implementation, they are equally supported by products from the Cloud Networking Business Unit at Cisco (Cisco ACI, DCNM/NDFC, NDO, NXOS). Here, we will discuss how Terraform and Ansible work with Nexus Dashboard Fabric Controller (NDFC). 

First, I will explain how Ansible and Terraform works, along with their workflow. We will then look at the use cases. Finally, we will discuss implementing Infrastructure as Code (IaC).

Ansible – Playbooks and Modules

For those of you that are new to automation, Ansible has two main parts – the inventory file and playbooks. The inventory file gives information about the devices we are automating including any sandbox environments set up. The playbook acts as the instruction manual for performing tasks on the devices declared in the inventory file. 

Ansible becomes a system of documentation once the tasks are written in a playbook. The playbook leverages REST API modules to describe the schema of the data that can be manipulated using Rest API calls. Once written, the playbook can be executed using the ansible-playbook command line.

Cisco Data Center, Cisco, Cisco Exam Prep, Cisco Tutorial and Material, Cisco Skills, Cisco Jobs, Cisco Preparation, Cisco News, Cisco Central
Ansible Workflow

Terraform – Terraform Init, Plan and Apply


Terraform has one main part – the TF template. The template will contain the provider details, the devices to be automated as well as the instructions to be executed. The following are the 3 main points about terraform:

1. Terraform defines infrastructure as code and manage the full lifecycle. Creates new resources, manages existing ones, and destroys ones no longer necessary. 

2. Terraform offers an elegant user experience for operators to predictably make changes to infrastructure.

3. Terraform makes it easy to re-use configurations for similar infrastructure designs.

While Ansible uses one command to execute a playbook, Terraform uses three to four commands to execute a template. Terraform Init checks the configuration files and downloads required provider plugins. Terraform Plan allows the user to create an execution plan and check if the execution plan matches the desired intent of the plan. Terraform Apply applies the changes, while Terraform Destroy allows the user to delete the Terraform managed infrastructure.

Once a template is executed for the first time, Terraform creates a file called terraform.state to store the state of the infrastructure after execution. This file is useful when making mutable changes to the infrastructure. The execution of the tasks is also done in a declarative method. In other words, the direction of flow doesn’t matter. 

Cisco Data Center, Cisco, Cisco Exam Prep, Cisco Tutorial and Material, Cisco Skills, Cisco Jobs, Cisco Preparation, Cisco News, Cisco Central
Terraform Workflow

Use Cases of Ansible and Terraform for NDFC


Ansible executes commands in a top to bottom approach. While using the NDFC GUI, it gets a bit tedious to manage all the required configuration when there are a lot of switches in a fabric. For example, to configure multiple vPCs or to deal with network attachments for each of these switches, it can get a bit tiring and takes up a lot of time. Ansible uses a variable in the playbook called states to perform various activities such as creation, modification and deletion which simplifies making these changes. The playbook uses the modules we have depending on the task at hand to execute the required configuration modifications. 

Terraform follows an infrastructure as code approach for executing tasks. We have one main.tf file which contains all the tasks which are executed with a terraform plan and apply command. We can use the terraform plan command for the provider to verify the tasks, check for errors and a terraform apply executes the automation. In order to interact with application specific APIs, Terraform uses providers. All Terraform configurations must declare a provider field which will be installed and used to execute the tasks. Providers power all of Terraform’s resource types and find modules for quickly deploying common infrastructure configurations. The provider segment has a field where we specify whether the resources are provided by DCNM or NDFC.

Cisco Data Center, Cisco, Cisco Exam Prep, Cisco Tutorial and Material, Cisco Skills, Cisco Jobs, Cisco Preparation, Cisco News, Cisco Central
Ansible Code Example (Click to view full size)

Cisco Data Center, Cisco, Cisco Exam Prep, Cisco Tutorial and Material, Cisco Skills, Cisco Jobs, Cisco Preparation, Cisco News, Cisco Central
Terraform Code Example (Click to view full size)

Below are a few examples of how Ansible and Terraform works with NDFC. Using the ansible-playbook command we can execute our playbook to create a VRF and network. 

Cisco Data Center, Cisco, Cisco Exam Prep, Cisco Tutorial and Material, Cisco Skills, Cisco Jobs, Cisco Preparation, Cisco News, Cisco Central

Cisco Data Center, Cisco, Cisco Exam Prep, Cisco Tutorial and Material, Cisco Skills, Cisco Jobs, Cisco Preparation, Cisco News, Cisco Central

Below is a sample of how a Terraform code execution looks: 

Cisco Data Center, Cisco, Cisco Exam Prep, Cisco Tutorial and Material, Cisco Skills, Cisco Jobs, Cisco Preparation, Cisco News, Cisco Central

Infrastructure as Code (IaC) Workflow 


Cisco Data Center, Cisco, Cisco Exam Prep, Cisco Tutorial and Material, Cisco Skills, Cisco Jobs, Cisco Preparation, Cisco News, Cisco Central
Infrastructure as a Code – CI/CD Workflow

One popular way to use Ansible and Terraform is by building it from a continuous integration (CI) process and then merging it from a continuous delivery (CD) system upon a successful application build:

◉ The CI asks Ansible or Terraform to run a script that deploys a staging environment with the application.

◉ When the stage tests pass, CD then proceeds to run a production deployment.

◉ Ansible/Terraform can then check out the history from version control on each machine or pull resources from the CI server.

An important benefit that is highlighted through IaC is the simplification of testing and verification. CI rules out a lot of common issues if we have enough test cases after deploying on the staging network. CD automatically deploys these changes onto production with just a simple click of a button. 

While Ansible and Terraform have their differences, NDFC supports the automation through both software equally and customers are given the option to choose either one or even both.

Terraform and Ansible complement each other in the sense that they both are great at handling IaC and the CI/CD pipeline. The virtualized infrastructure configuration remains in sync with changes as they occur in the automation scripts. 

There are multiple DevOps software alternatives out there to handle the runner jobs. Gitlab, Jenkins, AWS and GCP to name a few. 

In the example below, we will see how GitLab and Ansible work together to create a CI/CD pipeline. For each change in code that is pushed, CI triggers an automated build and verify sequence on the staging environment for the given project, which provides feedback to the project developers. With CD, infrastructure provisioning and production deployment is ensured once the verify sequence through CI has been successfully confirmed. 

As we have seen above, Ansible works in similar way to a common line interpreter, we define a set of commands to run against our hosts in a simple and declarative way. We also have a reset yaml file which we can use to revert all changes we make to the configuration. 

NDFC works along with Ansible and the Gitlab Runner to accomplish a CI/CD Pipeline. 

Gitlab Runner is an application that works with Gitlab CI/CD to run jobs in a pipeline. Our CI/CD job pipeline runs in a Docker container. We install GitLab Runner onto a Linux server and register a runner that uses the Docker executor. We can also limit the number of people with access to the runner so Pull Requests (PRs) of the merge can be raised and approved of the merge by a select number of people. 

Step 1: Create a Repository for the staging and production environment and an Ansible file to keep credentials safe. In this, I have used the ansible vault command to store the credentials file for NDFC.

Step 2: Create an Ansible file for resource creation. In our case, we have one main file for staging and production separately followed by a group_vars folder to have all the information about the resources. The main file pulls the details from the group_vars folder when executed.

Cisco Data Center, Cisco, Cisco Exam Prep, Cisco Tutorial and Material, Cisco Skills, Cisco Jobs, Cisco Preparation, Cisco News, Cisco Central

Step 3: Create a workflow file and check the output.

As above, our hosts.prod.yml and hosts.stage.yml inventory files act as the main file for implementing resource allocation to both production and staging respectively. Our group_vars folder contains all the resource information including fabric details, switch information as well as overlay network details. 

For the above example, we will be showing how adding a network to the overlay.yml file and then committing this change will invoke a CI/CD pipeline for the above architecture. 

Optional Step 4: Create a password file (Optional). Create a new file called password.txt containing the ansible vault password to encrypt and decrypt the Ansible vault file.

Cisco Data Center, Cisco, Cisco Exam Prep, Cisco Tutorial and Material, Cisco Skills, Cisco Jobs, Cisco Preparation, Cisco News, Cisco Central

Our overlay.yml file currently has 2 networks. Our staging and production environment has been reset to this stage. We will now add our new network network_db to the yaml file as below:

Cisco Data Center, Cisco, Cisco Exam Prep, Cisco Tutorial and Material, Cisco Skills, Cisco Jobs, Cisco Preparation, Cisco News, Cisco Central

First, we make this change to the staging by raising a PR and once it has been verified, the admin of the repo can then approve this PR merge which will make the changes to production. 

Once we make these changes to the Ansible file, we create a branch under this repo to which we commit the changes.

After this branch has been created, we raise a PR request. This will automatically start the CI pipeline.

Cisco Data Center, Cisco, Cisco Exam Prep, Cisco Tutorial and Material, Cisco Skills, Cisco Jobs, Cisco Preparation, Cisco News, Cisco Central

Once the staging verification has passed, the admin/manager of the repo can go ahead and approve of the merge which kicks in the CD pipeline for the production environment.

Cisco Data Center, Cisco, Cisco Exam Prep, Cisco Tutorial and Material, Cisco Skills, Cisco Jobs, Cisco Preparation, Cisco News, Cisco Central

If we check the NDFC GUI, we can find both staging and production contain the new network network_db. 

Cisco Data Center, Cisco, Cisco Exam Prep, Cisco Tutorial and Material, Cisco Skills, Cisco Jobs, Cisco Preparation, Cisco News, Cisco Central

Source: cisco.com