Saturday 12 November 2022

How to Use Presence Web Services

Presence of mind

Jabber is so last decades. Webex and its competition are the best modern means of messaging. But Cisco IM&P, a companion server to Cisco Call Manager, is still the best way to subscribe to user presence updates.

Presence Web Services, Cisco Certification, Cisco Prep, Cisco Certification, Cisco Preparation, Cisco Skills, Cisco Jobs

Suppose you have a group of employees to whom you assign tasks as they come in. If you can watch the presence of that group, you’ll know who is available, who is away, who is on the phone, etc. You can build an application that automatically assigns tasks according to the presence of the users.

The Presence Web Services (PWS) API, a feature of Cisco IM&P, is ideal for this kind of application. In my experience as a former developer support engineer, I noticed many developers don’t quite understand how to use PWS properly. I hope that by the time you’re done reading this, you’ll have a good grasp of everything involved in making PWS work for you.

Here’s a condensed breakdown of the steps:

1. Log in an application user with app username and password

a. This operation returns the application user session key

2. Use the application user session key to log in an end user

a. This operation returns an end user session key

3. Create a web service to handle presence notifications

a. Run this web service to listen on a common port, e.g., 8080

4. Use the application user session key to register the URL of your web service as an endpoint

a. This returns an endpoint ID

5. Use the end user session key to subscribe to one or more end user contacts

a. This returns a subscription ID

6. Create a script to fetch the subscribed presence, using the subscription ID

a. For example, get_subscribed_presence.py

In steps 1 and 2, there’s a choice called “force=”. If you set “force=true”, the server will return a new session key every time. I recommend you use “force=false”, so that it keeps re-using the same session key. This covers a multitude of programming sins.

In Step 3, it is important to use a common port, like 80, 82, 8080, etc. If your web service is based on Python and you use the Flask library, the default port for Flask is 5000, which will not work. You must tell flask to use one of the common ports, instead.

Once you have completed steps 1 through 5, any change in the presence of the contacts in your step 5 subscription will trigger a REST GET operation on the endpoint. The GET will pass two parameters: The subscription id which should always be 1 with these scripts, and etype, which should always be “PRESENCE_NOTIFICATION”.

Your application should then use the subscription ID to fetch all the presence changes for that subscription. The API for that is getSubscribedPresence. The script that invokes getSubscribedPresence is, coincidentally, get_subscribed_presence.py.

The sample scripts use REST, but you can also use SOAP.

No problemo!

A common problem occurs when you run your endpoint after a contact’s presence already changed. The server will send a presence notification to the endpoint, but the endpoint isn’t running, so that notification never gets to the endpoint, and the endpoint doesn’t fetch the subscribed presence information. This is a problem because, if for any reason you don’t fetch the presence values on that subscription, the server will stop sending future notifications until you do.

So, the script you create in Step 6 is a fail-safe. Suppose a contact, Carlotta Tendant, switches from AVAILABLE to AWAY. The server will notify the web service at the endpoint URL that a change in presence occurred. If your endpoint isn’t active, or it does not pick up the notification and fetch the presence information, the server will stop sending presence notifications until you fetch that presence information.

It is important to know that the presence notification doesn’t send any contact information or the fact that Carlotta is now AWAY; it just notifies the web service that a presence has changed for one or more contacts for that subscription. Your web service must fetch the information about the contact and the contact’s presence.

To avoid the possibility of missed notifications, run the get_subscribed_presence.py script once everything is set up and ready and your endpoint is running. This grabs the information for the users and their presence, and thus clears the queue for the server to send new presence notifications.

There is another reason the web service may not receive a notification. If the Cisco IM&P server CPU usage reaches 80% or higher, the server stops sending notifications until the CPU usage drops below 80%. Here’s how to compensate for that possibility. Write your app to perform a get subscribed presence at an interval of every 10 minutes (or whichever seems best), just to make sure that if, for any reason, your application did not act on a presence notification, the queue will clear, and notifications will continue.

Scripts

WARNING: Don’t use my sample scripts on a production server. These are for instructional purposes only.

My sample scripts are as follows:

pws-create.py

pws-delete.py

endpoint.py

get_subscribed_presence.py

And there are some data files the script uses to get information about the server, the host for the endpoint, app user, end user, and the contacts for your presence subscription.

serverparams.json (points to your Cisco IM&P server and the host IP address for the endpoint)

appuser.json (has the application username and password)

enduser.json (has the end user name. You use the session key from your application user login)

contacts.list (the list of contacts for which you will subscribe to get presence notifications)

Order Up

Here’s how you run the scripts, in order.

1. python3 pws-delete.py

    1. This removes all endpoints and subscriptions so you can start fresh

2. python3 pws-create.py

    1. This sets up the endpoint and subscribes to the presence of contacts in list. It uses serverparams.json to identify your Cisco IM&P server and the IP address of the host where your endpoint will run.

3. python3 endpoint.py

    1. This is the endpoint script. It uses the Flask Python library to work as a web service.

4. python3 get_subscribed_presence.py 1 BASIC_PRESENCE (or RICH_PRESENCE)

   1. You run this after the endpoint web service is up and running. This clears out any pending subscription updates and notifications so that the queue is empty and future notifications will work.

If you look at the code in the sample endpoint script, for the web service endpoint doesn’t include the code to fetch the subscription presence. I put all that into the get_subscribed_presence.py script. My endpoint simply executes the script externally like so:

subprocess.run("python3 get_subscribed_presence.py "+id+" "+etype, shell=True)

The endpoint will know the value of id and etype and pass the values when it runs get_subscribed_presence.py. If you want to run the script yourself, however, you need to pass values at the command line, for example:

python3 get_subscribed_presence.py 1 BASIC_PRESENCE

You can also use RICH_PRESENCE instead if that’s what you want. If you’re done everything correctly, the subscription id will always be 1, which is why you pass the number 1 to the script at the command line.

The sample script doesn’t do anything with the presence information. It prints it to the console where you run the endpoint web service. Your application must perform your needed task, such as updating a display of contacts and their presence. 

Source: cisco.com

Friday 11 November 2022

Cisco Champions the Powerful, Evolving Networking Software Stack

Cisco Champions, Cisco Career, Cisco Skills, Cisco Jobs, Cisco Prep, Cisco Preparation, Cisco Certification, Cisco IOS XE

With the interconnection of billions of devices in public and private networks and many applications and services moving to the cloud, software is increasingly becoming independent of and abstracted from hardware. At public cloud vendors like Amazon Web Services, Google Cloud Platform, and Microsoft Azure, hardware has been commoditized and software has taken center stage.

At Cisco, resellers and enterprise customers put complex solutions together using our products. The integration of switches, routers, and other gear with software used to require up to a one-year qualification cycle. But with the cloud providers, it’s immediate. Today, more native cloud concepts have been added to Cisco IOS XE software. Quarter by quarter, our enterprise software is becoming more efficient and cost-effective, more automated, and more programmable.

From Physical to Virtual to Cloud Native 


The first incarnation of Cisco enterprise cloud-enabled products was the virtualization of physical hardware devices in the cloud as virtual machines. They had all the existing concepts and features customers were used to in existing physical Cisco platforms.

In recent years we’ve been moving from physical to virtual to cloud-native products. As customers are becoming more aware and ready to consume cloud-native features, Cisco IOS XE is being enriched to provide those features. At 190 million lines of code―more than 300 million when vendor software development kits (SDKs) and open-source libraries are added―Cisco IOS XE runs 80+ platforms for access, distribution, core, wireless, and WAN layers. It facilitates a myriad of combinations of hardware and software, forwarding, and physical and virtual form factors.

Why Cisco? 


Prospective Cisco customers and competitors may ask, why spend $5000 for an enterprise switch when you can spend $1000? The answer is that our customers know that buying a cheaper switch may lack the features they need. Less expensive gear will also potentially add to their maintenance costs because the components may not be as good as Cisco’s.

Another reason to buy Cisco is due to the breadth of our enterprise portfolio. Any one company can do one vertical market well. With IOS XE, we have integrated everything across the networking software stack, and across the entire enterprise network, and we’re working to keep it simple across multiple network domains.

Efficiency and Cost-effectiveness 


With networking becoming increasingly feature-rich and complex, simpler networking software translates to greater efficiency, a smaller headcount, and fewer onsite visits to fix problems. For example, Cisco IOS XE provides simplified app hosting using a Docker image in a container and deployment using device controller tools. It supports third-party, off-the-shelf applications built using Linux toolchains that allow business apps to run at the network edge.

Other examples include the simplification of development, debugging, and device validation with Cisco Platform Abstraction (CPA) and unified software tracing that integrates traces from software running anywhere in a network for more complete visibility into 100+ processes in real-time. Another example of Cisco IOS XE simplicity is virtualization technology that runs over optical fiber, enabling switches to be physically located up to thousands of miles away from each other.

The Power of Automation 


Cisco IOS XE is becoming more and more self-driving. Cisco developers are increasingly taking away the manual tasks required to manage the network by automating them. That makes networks easier and cheaper to maintain and faster to debug.

Examples include the automation of image upgrades using Cisco DNA Center and support for programmable microservices to replace manual device upgrades, repurposing, and management. Other automated processes include streaming telemetry and analytics in all layers of software that run at the speed of events observed (e.g., faster than two million route updates per second) to handle the huge scale of networking operations.

Programmability 


Systems administrators in enterprise companies are constantly upgrading, repurposing, and managing thousands of switches. An advanced networking software stack must be able to manage multi-vendor networks using native and open-source data models. Cisco IOS XE supports a suite of Google Remote Procedure Call (gRPC)-based microservices that simplify and lighten workloads with programmability. They allow administrators to programmatically manage Cisco enterprise devices.

The IOS XE Development Environment  


A lot of enterprise software takes years to develop. The Cisco software development environment rolls out new solutions in months.

Developers spend 60-70% of their time developing software instead of application logic. The IOS XE development environment is automating as many common capabilities (like show commands, tracing, telemetry, export for dashboard, hand wiring HA code, testing base ISSU compatibility checks, and mocking for unit tests) as possible to avoid the need to hand code them. With hand coding, every one of these features would require developers to generate two-to-three times as much code. Hand coding is also not amenable to automated, flexible deployments and in the current development trajectory will not fit into the low-footprint devices we ship.

The Cisco Enterprise Networking software development team works at a solution level, conducting pre-qualification testing and providing the tools to control an entire enterprise dashboard from a single dashboard.

Source: cisco.com

Thursday 10 November 2022

Cisco Secure Firewall on AWS: Build resilience at scale with stateful firewall clustering

Cisco Secure Firewall on AWS, Cisco Career, Cisco Skills, Cisco Jobs, Cisco Prep, Cisco Preparation, Cisco Firewall, Cisco AWS

Organizations embrace the public cloud for the agility, scalability, and reliability it offers when running applications. But just as organizations need these capabilities to ensure their applications operate where needed and as needed, they also require their security does the same. Organizations may introduce multiple individual firewalls into their AWS infrastructure to produce this outcome. In theory, this may be a good decision, but in practice—this could lead to asymmetric routing issues. Complex SNAT configuration can mitigate asymmetric routing issues, but this isn’t practical for sustaining public cloud operations. Organizations are looking out for their long-term cloud strategies by ruling out SNAT and are calling for a more reliable and scalable solution for connecting their applications and security for always-on protection.

To solve these challenges, Cisco created stateful firewall clustering with Secure Firewall in AWS.

Cisco Secure Firewall clustering overview


Firewall clustering for Secure Firewall Threat Defense Virtual provides a highly resilient and reliable architecture for securing your AWS cloud environment. This capability lets you group multiple Secure Firewall Threat Defense Virtual appliances together as a single logical device, known as a “cluster.”

A cluster provides all the conveniences of a single device (management and integration into a network) while taking advantage of the increased throughput and redundancy you would expect from deploying multiple devices individually. Cisco uses Cluster Control Link (CCL) for forwarding asymmetric traffic across devices in the cluster. Clusters can go up to 16 members, and we use VxLAN for CCL.

In this case, clustering has the following roles:

Cisco Secure Firewall on AWS, Cisco Career, Cisco Skills, Cisco Jobs, Cisco Prep, Cisco Preparation, Cisco Firewall, Cisco AWS
Figure 1: Cisco Secure Firewall Clustering Overview

The above diagram explains traffic flow between the client and the server with the insertion of the firewall cluster in the network. Below defines the roles of clustering and how packet flow interacts at each step.

Clustering roles and responsibilities 


Owner: The Owner is the node in the cluster that initially receives the connection.

◉ The Owner maintains the TCP state and processes the packets. 
◉ A connection has only one Owner. 
◉ If the original Owner fails, the new node receives the packets, and the Director chooses a new Owner from the available nodes in the cluster.

Backup Owner: The node that stores TCP/UDP state information received from the Owner so that the connection can be seamlessly transferred to a new owner in case of failure.

Director: The Director is the node in the cluster that handles owner lookup requests from the Forwarder(s). 

◉ When the Owner receives a new connection, it chooses a Director based on a hash of the source/destination IP address and ports. The Owner then sends a message to the Director to register the new connection. 
◉ If packets arrive at any node other than the Owner, the node queries the Director. The Director then seeks out and defines the Owner node so that the Forwarder can redirect packets to the correct destination. 
◉ A connection has only one Director. 
◉ If a Director fails, the Owner chooses a new Director.

Forwarder: The Forwarder is a node in the cluster that redirects packets to the Owner. 

◉ If a Forwarder receives a packet for a connection it does not own, it queries the Director to seek out the Owner.  
◉ Once the Owner is defined, the Forwarder establishes a flow, and redirects any future packets it receives for this connection to the defined Owner.

Fragment Owner: For fragmented packets, cluster nodes that receive a fragment determine a Fragment Owner using a hash of the fragment source IP address, destination IP address, and the packet ID. All fragments are then redirected to the Fragment Owner over Cluster Control Link.  

Integration with AWS Gateway Load Balancer (GWLB)


Cisco brought support for AWS Gateway Load Balancer (Figure 2). This feature enables organizations to scale their firewall presence as needed to meet demand.

Cisco Secure Firewall on AWS, Cisco Career, Cisco Skills, Cisco Jobs, Cisco Prep, Cisco Preparation, Cisco Firewall, Cisco AWS
Figure 2: Cisco Secure Firewall and AWS Gateway Load Balancer integration 

Cisco Secure Firewall clustering in AWS


Building off the previous figure, organizations can take advantage of the AWS Gateway Load Balancer with Secure Firewall’s clustering capability to evenly distribute traffic at the Secure Firewall cluster. This enables organizations to maximize the benefits of clustering capabilities including increased throughput and redundancy. Figure 3 shows how positioning a Secure Firewall cluster behind the AWS Gateway Load Balancer creates a resilient architecture. Let’s take a closer look at what is going on in the diagram.

Cisco Secure Firewall on AWS, Cisco Career, Cisco Skills, Cisco Jobs, Cisco Prep, Cisco Preparation, Cisco Firewall, Cisco AWS
Figure 3: Cisco Secure Firewall clustering in AWS

Figure 3 shows an Internet user looking to access a workload. Before the user can access the workload, the user’s traffic is routed to Firewall Node 2 for inspection. The traffic flow for this example includes:

User -> IGW -> GWLBe -> GWLB -> Secure Firewall (2) -> GLWB -> GWLBe -> Workload

In the event of failure, the AWS Gateway Load Balancer cuts off existing connections to the failed node, making the above solution non-stateful.

Recently, AWS announced a new feature for their load balancers known as Target Failover for Existing Flows. This feature enables forwarding of existing connections to another target in the event of failure.

Cisco is an early adaptor of this feature and has combined Target Failover for Existing Flows with Secure Firewall clustering capabilities to create the industry’s first stateful cluster in AWS.

Cisco Secure Firewall on AWS, Cisco Career, Cisco Skills, Cisco Jobs, Cisco Prep, Cisco Preparation, Cisco Firewall, Cisco AWS
Figure 4: Cisco Secure Firewall clustering rehashing existing flow to a new node

Figure 4 shows a firewall failure event and how the AWS Gateway Load Balancer uses the Target Failover for Existing Flows feature to switch the traffic flow from Firewall Node 2 to Firewall Node 3. The traffic flow for this example includes:

User -> IGW -> GWLBe -> GWLB -> Secure Firewall (3) -> GLWB -> GWLBe -> Workload

Source: cisco.com

Tuesday 8 November 2022

Introducing Cisco Cloud Network Controller on Google Cloud Platform – Part 3

Part 1 and Part 2 of this blog series covered native cloud networking and firewall rules automation on GCP, and a read through is recommended for completeness. This final post of the series is about enabling external access for cloud resources. More specifically, it will focus on how customers can enable external connectivity from and to GCP, using either Cloud Native Router or Cisco Cloud Router (CCR) based on Cisco Catalyst 8000v, depending on use case.

By expanding previous capabilities, Cisco Cloud Network Controller (CNC) will provision routing, automate VPC peering between infra and user VPCs, and BGP IPSec connectivity to external networks with only a few steps using the same policy model.

Scenario


This scenario will leverage the existing configuration built previously represented by network-a and network-b VPCs. These user VPCs will be peered with the infra VPC in a hub and spoke architecture, where GCP cloud native routers will be provisioned to establish BGP IPSec tunnels with an external IPSec device. The GCP cloud native routers are composed by the combination of a Cloud Router and a High-availability (HA) Cloud VPN gateway.

The high-level topology below illustrates the additional connections automated by Cisco CNC.

Cisco Cloud Network, Google Cloud Platform, Cisco Career, Cisco Skills, Cisco Jobs, Cisco Tutorial and Materials, Cisco Learning, Cisco Tutorial and Materials

Provisioning Cloud Native Routers


The first step is to enable external connectivity under Region Management by selecting in which region cloud native routers will be deployed. For this scenario, they will be provisioned in the same region as the Cisco CNC as depicted on the high-level topology. Additionally, default values will be used for the IPSec Tunnel Subnet Pool and BGP AS under the Hub Network representing the GCP Cloud Router.

The cloud native routers are being provisioned purposely on a different region to illustrate the ability of having a dedicated hub network with external access. However, they could have been deployed on the same region as the user VPCs.

Cisco Cloud Network, Google Cloud Platform, Cisco Career, Cisco Skills, Cisco Jobs, Cisco Tutorial and Materials, Cisco Learning, Cisco Tutorial and Materials

Note: a brief overview of the Cisco CNC GUI was provided on Part 1.

Enabling External Networks


The next step is to create an External Network construct within the infra tenant. This is where an external VRF is also defined to represent external networks connected to on-premises data centers or remote sites. Any cloud VRF mapped to existing VPC networks can leak routes to this external VRF or can get routes from it. In addition to the external VRF definition, this is also where VPN settings are entered with the remote IPSec peer details.

The configuration below illustrates the stitching of the external VRF and the VPN network within the region where the cloud native routers are being provisioned in the backend. For simplicity, the VRF was named as “external-vrf” but in a production environment, the name should be defined wisely and aligned to the external network as to improve operations.

Cisco Cloud Network, Google Cloud Platform, Cisco Career, Cisco Skills, Cisco Jobs, Cisco Tutorial and Materials, Cisco Learning, Cisco Tutorial and Materials

The VPN network settings require public IP of the remote IPSec device, IKE version, and BGP AS. As indicated earlier, the default subnet pool is being used.

Cisco Cloud Network, Google Cloud Platform, Cisco Career, Cisco Skills, Cisco Jobs, Cisco Tutorial and Materials, Cisco Learning, Cisco Tutorial and Materials

Once the external network is created, Cisco CNC generates a configuration file for the remote IPSec device to establish BGP peering and IPSec tunnels with the GCP cloud native routers. Below is the option to download the configuration file.

Cisco Cloud Network, Google Cloud Platform, Cisco Career, Cisco Skills, Cisco Jobs, Cisco Tutorial and Materials, Cisco Learning, Cisco Tutorial and Materials

Configuring External IPSec Device


As the configuration file provides most of the configuration required for the external IPSec device, customization is needed only on tunnel source interface and routing settings where applicable to match local network requirements. In this example, the remote IPSec device is a virtual router using interface GigabitEthernet1. For brevity, only one of the IPSec tunnels config is shown below along with all the other config generated by Cisco CNC.

vrf definition external-vrf
    rd 100:1
    address-family ipv4
    exit-address-family

interface Loopback0
    vrf forwarding external-vrf
    ip address 41.41.41.41 255.255.255.255

crypto ikev2 proposal ikev2-1
    encryption aes-cbc-256 aes-cbc-192 aes-cbc-128
    integrity sha512 sha384 sha256 sha1
    group 24 21 20 19 16 15 14 2

crypto ikev2 policy ikev2-1
    proposal ikev2-1

crypto ikev2 keyring keyring-ifc-3
    peer peer-ikev2-keyring
        address 34.124.13.142
        pre-shared-key 49642299083152372839266840799663038731

crypto ikev2 profile ikev-profile-ifc-3
    match address local interface GigabitEthernet1
    match identity remote address 34.124.13.142 255.255.255.255
    identity local address 20.253.155.252
    authentication remote pre-share
    authentication local pre-share
    keyring local keyring-ifc-3
    lifetime 3600
    dpd 10 5 periodic

crypto ipsec transform-set ikev-transport-ifc-3 esp-gcm 256
    mode tunnel

crypto ipsec profile ikev-profile-ifc-3
    set transform-set ikev-transport-ifc-3
    set pfs group14
    set ikev2-profile ikev-profile-ifc-3

interface Tunnel300
    vrf forwarding external-vrf
    ip address 169.254.0.2 255.255.255.252
    ip mtu 1400
    ip tcp adjust-mss 1400
    tunnel source GigabitEthernet1
    tunnel mode ipsec ipv4
    tunnel destination 34.124.13.142
    tunnel protection ipsec profile ikev-profile-ifc-3

ip route 34.124.13.142 255.255.255.255 GigabitEthernet1 192.168.0.1

router bgp 65002
    bgp router-id 100
   bgp log-neighbor-changes
    address-family ipv4 vrf external-vrf
        network 41.41.41.41 mask 255.255.255.255
        neighbor 169.254.0.1 remote-as 65534
        neighbor 169.254.0.1 ebgp-multihop 255
        neighbor 169.254.0.1 activate

Verifying External Connectivity status


Once configuration is applied, there are a few ways to verify BGP peering and IPSec tunnels between GCP and external devices: via CLI on the IPSec device itself and via Cisco CNC GUI on the External Connectivity dashboard.

Cisco Cloud Network, Google Cloud Platform, Cisco Career, Cisco Skills, Cisco Jobs, Cisco Tutorial and Materials, Cisco Learning, Cisco Tutorial and Materials

In the GCP console (infra project), under Hybrid Connectivity, it shows both the IPSec and BGP sessions are established accordingly by the combination of a Cloud Router and an HA Cloud VPN gateway automated by Cisco CNC, upon definition of the External Network. Note that the infra VPC network is named as overlay-1 by default as part of the Cisco CNC deployment from the marketplace.

Cisco Cloud Network, Google Cloud Platform, Cisco Career, Cisco Skills, Cisco Jobs, Cisco Tutorial and Materials, Cisco Learning, Cisco Tutorial and Materials

Route Leaking Between External and VPC Networks


Now that BGP IPSec tunnels are established, let’s configure inter-VRF routing between external networks and existing user VPC networks from previous sections. This works by enabling VPC peering between the user VPCs and the infra VPC hosting VPN connections, which will share these VPN connections to external sites. Routes received on the VPN connections are leaked to user VPCs, and user VPC routes are advertised on the VPN connections.

Using inter-VRF routing, the route is leaked between the external VRF of the VPN connections and the cloud local user VRFs. The configuration below illustrates route leaking from external-vrf to network-a.

Cisco Cloud Network, Google Cloud Platform, Cisco Career, Cisco Skills, Cisco Jobs, Cisco Tutorial and Materials, Cisco Learning, Cisco Tutorial and Materials

The reverse route leaking configuration from network-a to external-vrf is filtered with Subnet IP to show granularity. Also, the same steps were performed for network-b but not depicted for brevity.

Cisco Cloud Network, Google Cloud Platform, Cisco Career, Cisco Skills, Cisco Jobs, Cisco Tutorial and Materials, Cisco Learning, Cisco Tutorial and Materials

In addition to the existing peering between network-a and network-b VPCs, now both user VPCs are also peered with the infra VPC (overlay-1) as depicted on the high-level topology.

Cisco Cloud Network, Google Cloud Platform, Cisco Career, Cisco Skills, Cisco Jobs, Cisco Tutorial and Materials, Cisco Learning, Cisco Tutorial and Materials

By exploring one of the peering connection details, it is possible to see the external subnet 41.41.41.41/32 in the imported routes table.

Cisco Cloud Network, Google Cloud Platform, Cisco Career, Cisco Skills, Cisco Jobs, Cisco Tutorial and Materials, Cisco Learning, Cisco Tutorial and Materials

On the remote IPSec device, the subnets from network-a and network-b VPCs are learned over BGP peering as expected.

remote-site#sh bgp vpnv4 unicast vrf external-vrf
<<<output omitted for brevity>>>
     Network          Next Hop            Metric LocPrf Weight Path
Route Distinguisher: 100:1 (default for vrf external-vrf)
 *>   41.41.41.41/32   0.0.0.0                  0         32768 i
 *    172.16.1.0/24    169.254.0.5            100             0 65534 ?
 *>                    169.254.0.1            100             0 65534 ?
 *    172.16.128.0/24  169.254.0.5            100             0 65534 ?
 *>                    169.254.0.1            100             0 65534 ?
remote-site#

Defining External EPG for the External Network


Up to this point, all routing policies were automated by Cisco CNC to allow external connectivity to and from GCP. However, firewall rules are also required for end-to-end connectivity. This is accomplished by creating an external EPG using subnet selection as the endpoint selector to represent external networks. Note that this external EPG is also created within the infra tenant and associated to the external-vrf created previously.

Cisco Cloud Network, Google Cloud Platform, Cisco Career, Cisco Skills, Cisco Jobs, Cisco Tutorial and Materials, Cisco Learning, Cisco Tutorial and Materials

The next step is to apply contracts between the external EPG and the previously created cloud EPGs to allow communication between endpoints in GCP and external networks, which in this scenario is represented by 41.41.41.41/32 (loopback0 on remote IPSec device). As this is happening across different tenants, the contract scope is set to global and exported from the engineering tenant to the infra tenant and vice-versa, if allowing traffic to be initiated from both sides.

Cisco Cloud Network, Google Cloud Platform, Cisco Career, Cisco Skills, Cisco Jobs, Cisco Tutorial and Materials, Cisco Learning, Cisco Tutorial and Materials
To the cloud connectivity

Cisco Cloud Network, Google Cloud Platform, Cisco Career, Cisco Skills, Cisco Jobs, Cisco Tutorial and Materials, Cisco Learning, Cisco Tutorial and Materials
From the cloud connectivity

On the backend, the combination of contracts and filters translates into proper GCP firewall rules, as covered in details on Part 2 of this series. For brevity, only the outcome is provided below.

remote-site#ping vrf external-vrf 172.16.1.2 source lo0
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 172.16.1.2, timeout is 2 seconds:
Packet sent with a source address of 41.41.41.41 !!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 84/84/86 ms

remote-site#ping vrf external-vrf 172.16.128.2 source lo0
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 172.16.128.2, timeout is 2 seconds:
Packet sent with a source address of 41.41.41.41 !!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 132/133/138 ms

root@web-server:/home/marinfer# ping 41.41.41.41
PING 41.41.41.41 (41.41.41.41) 56(84) bytes of data.
64 bytes from 41.41.41.41: icmp_seq=1 ttl=254 time=87.0 ms
64 bytes from 41.41.41.41: icmp_seq=2 ttl=254 time=84.9 ms
64 bytes from 41.41.41.41: icmp_seq=3 ttl=254 time=83.7 ms
64 bytes from 41.41.41.41: icmp_seq=4 ttl=254 time=83.8 ms
root@web-server:/home/marinfer# 

root@app-server:/home/marinfer# ping 41.41.41.41
PING 41.41.41.41 (41.41.41.41) 56(84) bytes of data.
64 bytes from 41.41.41.41: icmp_seq=1 ttl=254 time=134 ms
64 bytes from 41.41.41.41: icmp_seq=2 ttl=254 time=132 ms
64 bytes from 41.41.41.41: icmp_seq=3 ttl=254 time=131 ms
64 bytes from 41.41.41.41: icmp_seq=4 ttl=254 time=136 ms
root@app-server:/home/marinfer#

Advanced Routing Capabilities with Cisco Cloud Router


Leveraging native routing capabilities as demonstrated may suffice for some specific use cases and be limited for others. Therefore, for more advanced routing capabilities, Cisco Cloud Routers can be deployed instead. The provisioning process is relatively the same with CCRs also instantiated within the infra VPC in a hub and spoke architecture. Besides having the ability to manage the complete lifecycle of the CCRs from the Cisco CNC, customers can also choose different tier-based throughput options based on requirements.

One of the main use cases for leveraging Cisco Cloud Routers is the BGP EVPN support across different cloud sites running Cisco CNC, or for hybrid cloud connectivity with on-prem sites when policy extension is desirable. The different inter-site uses cases are being documented on specific white papers, and below is a high-level topology illustrating the architecture.

Cisco Cloud Network, Google Cloud Platform, Cisco Career, Cisco Skills, Cisco Jobs, Cisco Tutorial and Materials, Cisco Learning, Cisco Tutorial and Materials

Source: cisco.com

Sunday 6 November 2022

Introducing Cisco Cloud Network Controller on Google Cloud Platform – Part 2

Part 1 of this blog series demonstrated how Cisco CNC can automate cloud networking within GCP independently of security policies. Part 2 goes over additional capabilities pertaining to contract-based routing and firewall rules automation by extending the same policy model.

One of the reasons for decoupling routing and security is to give customers more flexibility. Often, organizations may have different teams responsible for cloud networking and security policies definitions in the cloud. However, for those use cases where policy consistency is a top priority followed by more governance of cloud resources, a common policy model is a must.

Policy Model Translation


Below is a high-level one-to-one mapping of the Cisco CNC policy model to native GCP cloud constructs.

Cisco Cloud Network Controller, Google Cloud Platform, Cisco Career, Cisco Prep, Cisco Tutorial and Materials, Cisco Skill, Cisco Jobs, Cisco Prep, Cisco Preparation

Essentially, a tenant maps to a project and is the top-level logical container holding all the other policies. For cloud networking, Cisco CNC translates the combination of VRF and Cloud Context Profile into global VPC networks and regional subnets. In the scenario below, Cisco CNC will also translate security policies by combining cloud EPGs (Endpoint Groups) with contracts and filters into firewall rules and network tags in GCP.

By definition, a cloud EPG is a collection of endpoints sharing the same security policy, can have endpoints in one or more subnets and is tied to a VRF.

Scenario


This scenario has two VRFs: network-a and network-b. Additionally, cloud EPGs Web & App will be created and associated to contracts with specific security policies defined by filters. A Cloud External EPG will also be created as Internet EPG to allow internet access on network-a.

Cisco Cloud Network Controller, Google Cloud Platform, Cisco Career, Cisco Prep, Cisco Tutorial and Materials, Cisco Skill, Cisco Jobs, Cisco Prep, Cisco Preparation

On GCP, these policies are translated into proper VPC networks, subnets, routing tables, peering, firewall rules, and network tags. Note that for this scenario, VPCs and subnets were already pre-provisioned.

Cisco Cloud Network Controller, Google Cloud Platform, Cisco Career, Cisco Prep, Cisco Tutorial and Materials, Cisco Skill, Cisco Jobs, Cisco Prep, Cisco Preparation

Contract-based Routing


On Part 1 of this blog series, a route leak policy was created to allow inter-VRF routing between network-a and network-b. For this scenario, only contract-based routing will be enabled, which means contracts will drive routing where needed. Therefore, the leak route policy created previously was removed and peering between VPCs disconnected.

Contract-based Routing is a global mode configuration available in the Cloud Network Controller Setup. Note that when contract-based routing is enabled, the routes between a pair of internal VRFs can be leaked using contracts only in the absence of a route leak policy.

Cisco Cloud Network Controller, Google Cloud Platform, Cisco Career, Cisco Prep, Cisco Tutorial and Materials, Cisco Skill, Cisco Jobs, Cisco Prep, Cisco Preparation

Note: a brief overview of the Cisco CNC GUI was provided on Part 1.

Firewall Rules Automation


The configuration below illustrates the creation of Web and Internet EPGs tied to network-a, along with their associated endpoint selectors. Those are used to assign endpoints to a Cloud EPG, and can be based on IP address, Subnet, Region, or Custom tags (using a combination of key value pairs and match expressions).

For the Web EPG, a key value pair is used with specific tags to be matched (custom: epg equals web). For the Internet EPG, a subnet selector is used allowing all traffic. Furthermore, Internet EPG needs to be type External as internet access will be allowed on network-a.

Cisco Cloud Network Controller, Google Cloud Platform, Cisco Career, Cisco Prep, Cisco Tutorial and Materials, Cisco Skill, Cisco Jobs, Cisco Prep, Cisco Preparation
Web EPG

Cisco Cloud Network Controller, Google Cloud Platform, Cisco Career, Cisco Prep, Cisco Tutorial and Materials, Cisco Skill, Cisco Jobs, Cisco Prep, Cisco Preparation
Internet EPG

The Cloud EPG App configuration is not depicted for brevity but is similar to that of cloud EPG Web. However, it is tied to network-b and set with its unique endpoint selector (custom: epg equals app).

On GCP, these policies get translated to dedicated ingress firewall rules and network tags for Web and App as highlighted using the following format: capic-<app-profile-name>-<epg-name>.

Cisco Cloud Network Controller, Google Cloud Platform, Cisco Career, Cisco Prep, Cisco Tutorial and Materials, Cisco Skill, Cisco Jobs, Cisco Prep, Cisco Preparation

Note: Rebranding from Cloud APIC to Cloud Network Controller is covered on Part 1.

In the example below, cloud endpoints instantiated in GCP with labels matching the endpoint selectors are assigned to network tags and firewall rules automated by Cisco CNC.

Cisco Cloud Network Controller, Google Cloud Platform, Cisco Career, Cisco Prep, Cisco Tutorial and Materials, Cisco Skill, Cisco Jobs, Cisco Prep, Cisco Preparation

Associating Contracts to EPGs

Now, let’s associate the web-to-app contract between Web and App EPGs using the concept of consumer and provider to define rules direction.

Cisco Cloud Network Controller, Google Cloud Platform, Cisco Career, Cisco Prep, Cisco Tutorial and Materials, Cisco Skill, Cisco Jobs, Cisco Prep, Cisco Preparation

Upon associating the contract, additional ingress and egress firewall rules are programmed depending on the consumer and provider relationship specified. Specifically, these firewall rules are updated based on security policies defined through contracts and filters. For brevity, all traffic is allowed but granular filters can be added per requirements. On another note, these rules are only programmed once cloud endpoints matching the rules are instantiated.

Cisco Cloud Network Controller, Google Cloud Platform, Cisco Career, Cisco Prep, Cisco Tutorial and Materials, Cisco Skill, Cisco Jobs, Cisco Prep, Cisco Preparation

Wait, what about peering between these VPCs? Since contract-based routing is enabled, it also drives routing by enabling peering and auto generating routes to each other accordingly.

Cisco Cloud Network Controller, Google Cloud Platform, Cisco Career, Cisco Prep, Cisco Tutorial and Materials, Cisco Skill, Cisco Jobs, Cisco Prep, Cisco Preparation

Lastly, let’s allow internet access to web services residing on network-a by adding the internet-access contract between Internet and Web EPGs.

Cisco Cloud Network Controller, Google Cloud Platform, Cisco Career, Cisco Prep, Cisco Tutorial and Materials, Cisco Skill, Cisco Jobs, Cisco Prep, Cisco Preparation

As soon as the contract is associated, Cisco CNC adds an ingress firewall rule with network tags representing the Web EPG which allows internet access to endpoints behind it.

Cisco Cloud Network Controller, Google Cloud Platform, Cisco Career, Cisco Prep, Cisco Tutorial and Materials, Cisco Skill, Cisco Jobs, Cisco Prep, Cisco Preparation

From this point on, internet access to web-server is allowed as well as connectivity from the web-server to the app-server.

root@web-server:/home/marinfer# ifconfig ens4
ens4: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1460
        inet 172.16.1.2  netmask 255.255.255.255  broadcast 172.16.1.2
        inet6 fe80::4001:acff:fe10:102  prefixlen 64  scopeid 0x20<link>
        ether 42:01:ac:10:01:02  txqueuelen 1000  (Ethernet)
        RX packets 19988  bytes 3583929 (3.4 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 17707  bytes 1721956 (1.6 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
root@web-server:/home/marinfer# ping 172.16.128.2
PING 172.16.128.2 (172.16.128.2) 56(84) bytes of data.
64 bytes from 172.16.128.2: icmp_seq=1 ttl=64 time=58.3 ms
64 bytes from 172.16.128.2: icmp_seq=2 ttl=64 time=56.0 ms
64 bytes from 172.16.128.2: icmp_seq=3 ttl=64 time=56.0 ms
64 bytes from 172.16.128.2: icmp_seq=4 ttl=64 time=56.0 ms

Cloud Resources Visibility


Using a cloud-like policy model, Cisco CNC provides a topology and hierarchical view of cloud resources on a per tenant basis with drill down options. Moreover, application profile containers group together cloud EPGs and associated contracts for easy visibility of policies and dependencies.

Cisco Cloud Network Controller, Google Cloud Platform, Cisco Career, Cisco Prep, Cisco Tutorial and Materials, Cisco Skill, Cisco Jobs, Cisco Prep, Cisco Preparation

More granular visibility is provided all the way to cloud endpoints. Firewall rules are also visible via Cisco CNC GUI under Ingress and Egress Rules.

Cisco Cloud Network Controller, Google Cloud Platform, Cisco Career, Cisco Prep, Cisco Tutorial and Materials, Cisco Skill, Cisco Jobs, Cisco Prep, Cisco Preparation

Source: cisco.com

Saturday 5 November 2022

Battle of the Fabrics – The Road to a Future Ready Simple Network

The Evolution of Enterprise Networks for Campus


Digital transformation is creating new opportunities in every industry. In Healthcare doctors can monitor patients remotely and leverage medical analytics to predict health issues. Technology enables a connected campus and more personalized and equal access to learning resources in education. Within retail, shops can provide a seamless, engaging experience in-store and online using location awareness. In the world of finance, technology enables users to securely bank anywhere, anytime, using the device of their choice. In today’s world, digital transformation is essential for businesses to stay relevant.

These digital transformations have required more from networks than ever before. Over time, campus design has been forever changed by the additional demands on the network, each requiring more capabilities and flexibility than previous network designs. Over the past ten years, the enterprise network has continued to evolve from traditional designs to enterprise Fabrics that resemble a service provider design and encompass an Underlay and Overlay.

Fundamentally, it’s essential to understand what typical IT departments, even those segmented within organizations, are attempting to achieve. Ultimately, each company has an IT department to deliver applications that the company relies on to achieve some aim, whether for the public good or for monetary reasons, which could take on many forms, from Manufacturing to Retail, to Financial and beyond. If you look at the core ask, these organizations want a service delivered at some service level to ensure business continuity. For that reason, when the organization introduces new applications or devices, we need to flexibly adopt these new entities securely and simultaneously roll these changes out to the network.

Additionally, more emphasis is being placed on pushing configuration changes quickly, accurately, securely, and at scale while balancing that with accountability. Automation and orchestration are critical to the network of the future, and the ability to tie them into a platform that not only applies configuration but also measures success through both application and user experience is fundamental.

For any organization to successfully transition to a digital world, investment in its network is critical. The network connects all things and is the cornerstone where digital success is realized or lost. The network is the pathway for productivity and collaboration and an enabler of improved end-user experience. And the network is also the first line of defense in securing enterprise assets and intellectual property.

Essentially, everyone in networking is looking for the easy button. We all are looking to reduce the number of devices and complexity while maintaining the flexibility of supporting the business’s priorities from both an application and endpoint perspective. Suppose we can simplify and have the highest available network of the future, which is easily extensible, flexible enough to meet our needs, and is at the same time fully automated and provides telemetry. In that case, we can look at it simply, then perhaps we would head toward that nirvana.

A Fabric can be that solution and is the road to a future-ready, simple network. We remove the reliance on 15 to 20 protocols in favor of 3 to simplify the operational complexities. We fully integrate all wired and wireless access components and utilize the bandwidth available on many links to support future technologies like Wifi 6E and beyond. We should bond policy as part of the ecosystem and use the network to apply and enforce that policy. We can learn intrinsically from the network with telemetry and use Artificial intelligence and Machine Learning to solve issues in a prompted and even automated way. We will discuss all these concepts in more detail in the next couple of sections.

Fabric Overview


Figure 1. Fabric Concepts with Underlay and Overlay
A Fabric is simply an Overlay network. Overlays are created through encapsulation, which adds one or more additional headers to the original packet or frame. An overlay network creates a logical topology to virtually connect devices built over an arbitrary physical Underlay topology.

In an idealized, theoretical network, every device would be connected to every other. In this way, any connectivity or topology imagined could be created. While this theoretical network does not exist, there is still a technical desire to connect all these devices in a full mesh. This is where the term Fabric comes from: it is a cloth where everything is connected. An Overlay (or tunnel) provides this logical full-mesh connection in networking. We would then automate the build of these networks of the future using fewer protocols, replacing or eliminating older L2/L3 protocols (often up to 15-20 protocols) and replacing them with as few as 3 protocols. This allows us to have a simple, flexible, fully automated approach where wired and wireless can be incorporated into the Overlay.

Underlay

The Underlay network is defined by the physical switches and routers used to deploy the Fabric. All network elements of the Underlay must establish IP connectivity via the use of a routing protocol. The Fabric Underlay supports any arbitrary network topology. Instead of using arbitrary network topologies and protocols, the underlay implementation for a Fabric typically uses a well-designed Layer 3 foundation inclusive of the Campus Edge switches, known as a Layer 3 Routed Access design. This ensures the network’s performance, scalability, resiliency, and deterministic convergence.

The Underlay switches support the physical connectivity for users and endpoints. However, end-user subnets and endpoints are not part of the Underlay network and have become part of the automated Overlay network.

Overlay

An Overlay network is a logical topology used to virtually connect devices and is built over an arbitrary physical Underlay topology. The Fabric Overlay network is created on top of the Underlay network through virtualization, creating Virtual Networks (VN). The data, traffic, and control plane signaling are contained within each Virtual Network, maintaining isolation among the networks and independence from the Underlay network. Multiple Overlay networks can run across the same Underlay network through virtualization.

Virtual Networks

Fabrics provide Layer 3 and Layer 2 connectivity across the Overlay using Virtual Networks (VN). Layer 3 Overlays emulate an isolated routing table and transport Layer 3 frames over the Layer 3 network. This type of Overlay is called a Layer 3 Virtual Network. A Layer 3 Virtual Network is a virtual routing domain analogous to a Virtual Routing and Forwarding (VRF) table in a traditional network.

Layer 2 Overlays emulate a LAN segment and transport Layer 2 frames over the Layer 3 network. This type of Overlay is called a Layer 2 Virtual Network. Layer 2 Virtual Networks are virtual switching domains analogous to a VLAN in a traditional network.

Each frame from an endpoint within a VN is forwarded in the encapsulated tunnel toward its destination. Similarly, older designs may have used labels to encapsulate traffic in MPLS networks. To determine where the destination is, we need some form of tracking capability to determine where the target is and where to forward the packet. This is accomplished by the Control Plane of the Fabric. In older MPLS networks, and those used by service providers, the control plane was a combination of LDP/TDP for propagating labels and BGP, which utilized the augmentations for separating routing into various VN’s.

Control Plane

To forward traffic within each Overlay, we need a way of mapping where the sources and destinations are located. Typically, the IP address and MAC address are associated with an endpoint and are used to define its identity and location in the network. The IP address is used to identify at layer 3 who and where the device is on the network. At layer 2, the MAC address can also be used within broadcast domains for host-to-host communications when layer 2 is available. This is commonly referred to as addressing the following topology.  While an endpoint’s location in the network will change, who this device is and what it can access should not have to change.

Additionally, the ability to reduce fault domains and remove Spanning-Tree Protocol (STP) are big differentiators to driving the need for routed access and removing the reliance on technology which often had slower convergence times. To give a Layer 3 Routed Network the same kind of capabilities, we need to first track those endpoints and then forward traffic between them and off the network to destinations when needed for internet connectivity.

This is the role and function of the Control Plane, whose job it is to track Endpoint Identifiers (EID), more commonly referred to as Endpoints within a Fabric Overlay. This allows the Fabric to forward that traffic in an encapsulated packet separating it from the other VN, thus automatically providing Macro Segmentation while allowing it to meander through the Fabric to the destination. There are differing Fabrics, and each Fabric technology utilizes some form of Control Plane to centralize this mapping system which both the borders and edge nodes rely on. Each technology has its pros and cons, which come to form caveats that we must adhere to when designing and correctly choosing between Fabric technologies.

Locator/ID Separation Protocol (LISP) 

Cisco Software-Defined Access (Cisco SD-Access) utilizes the Locator/ID Separation Protocol (LISP) as the Control Plane protocol. LISP simplifies network operations through mapping servers and allows the resolution of endpoints within a Fabric. One of the benefits of this approach is that it is utilized for prefixes not installed in the Routing Information Base. Thus, this is not impactful to edge switches with smaller memory and CPU capabilities to the larger core devices and allows us to expand the Fabric right down to the Edge.

LISP ratified in RFC 6830 allows the separation of identity and location through a mapping relationship of these two namespaces: EID in relationship to its Routing LOCator (RLOC). These EID-to-RLOC mappings are held in the mapping servers, which are highly available throughout the Fabric and which resolve EIDs to RLOCs in the same way Domain Name Servers (DNS) servers resolve web addresses using a PULL type update. This allows for greater scale when deploying the protocols that make up the Fabrics Control Plane. It allows us to fully utilize the capabilities of both Virtual Networks (namespaces) and encapsulation or tunneling. Traffic is encapsulated from end to end, and we will enable the use of consistent IP addressing across the network behind multiple Layer 3 anycast gateways across multiple edge switches. Thus instead of a push from the routing protocol, conversational learning occurs, where forwarding entries are populated in Cisco Express Forwarding only where they are needed.

Figure 2. LISP Control Plane Operation

Instead of a typical traditional routing-based decision, the Fabric devices query the control plane node to determine the routing locator associated with the destination address (EID-to-RLOC mapping) and use that RLOC information as the traffic destination.  In case of a failure to resolve the destination routing locator, the traffic is sent to the default Fabric border node. The response received from the control plane node is stored in the LISP map cache, driving the Cisco Express Forwarding (CEF) table and installed in hardware. This gives us an optimized forwarding table without needing a routing protocol update and saves CPU and memory utilization.

Border Gateway Protocol (BGP) 

Conversely, Border Gateway Protocol (BGP), which has been heavily augmented over the years, was initially designed for routing between organizations across the internet. Kirk Lougheed and Len Bosack of Cisco and Yakov Rekhter of IBM at an Internet Engineering Task Force (IETF) conference co-authored BGP RFC 1105 in 1989. Cisco has been heavily vested in innovations, maintenance, and adoption of the protocol suite ever since and, over the years, has helped design and added various capabilities to its toolset. BGP forms the core routing protocol of many service provider networks, primarily because of its ability to have a policy-based routing approach. BGP and its routes are installed in the Routing Information Base (RIB) within the network devices of the Fabric. Updates are provided by the protocol to a full mesh of BGP nodes in a PUSH-type fashion. While they can be controlled via policy, by default, all routes are typically shared.

As BGP consumes space within the RIB, let’s evaluate this further, as the implications are extensive. Each device in a Dual-Stack network (IPv4 and IPv6 enabled) utilizes two entries for IPv4 networks, the MAC Address and the IPv4 address as its network prefix.  This is effectively 1 network prefix with 2 EID for each endpoint in IPv4. Similarly, in IPv6, each EID would have a Link-Local address, a host address, and a multicast type address entry similar to the network prefix. Each IPv6 address consumes 2 entries per address, and thus we have another 4 entries per endpoint, all of which would be needed within the RIB on all BGP-enabled nodes within the Fabric as it’s a full mesh design. Additionally, the routing protocol must maintain those adjacencies and update each peer as endpoints traverse the Fabric. Due to the processing required in the BGP control plane on every update, there is a higher need for CPU and memory resources as the EID entries change or move within the Fabric.

Figure 3. BGP Protocol

In the figure above, you will see that utilizing BGP as the control plane requires that the edge device first maintain routing adjacencies, process updates using its algorithm, then install the update in the Forwarding Information Database (FIB) within the CEF table.

Most Access switches or within Fabrics called Edge Nodes have smaller RIB capabilities than the cores they peer with. Typically you will see 32000 entries available on most of the current lines of switching for Edge Nodes. This is quickly consumed by the number of addresses per endpoint, leaving you room for fewer devices if we were not to employ policies and filtering. Thus to accommodate scale, we would need policy, which means we need to modify BGP for its use in a Fabric. As devices roam throughout the network, it is important to understand that updates for each device will be propagated by BGP to every node within that full mesh network.  If we were to use our DNS analogy for each roaming event instead of a specific DNS query we force a DNS Zone Transfer.

Another approach is to end the BGP routing at the larger, more powerful core and distribution switches and resort to layer 2 trunks below. Here we would utilize STP, which has slightly slower convergence times in the event of link failures, but all of which can be tuned, but then the network has less reliability and high availability when compared to other solutions. As soon as we need to rely on those Layer 2 protocols, our Fabric has diminished benefits, and we have not achieved the goal of simplification.

Data Plane

In order to forward traffic within each Overlay after sources and destinations are located is the role of the Data Plane. Traffic in Overlays utilizes encapsulation, and many forms of that have been used in various use cases from large enterprises to service provider networks the globe over. In service provider networks, a typical encapsulation is Multi-Protocol Label Switching (MPLS) which encapsulates each packet and utilizes a labeling method to segment traffic. The labeling in MPLS networks was later modified to simplify convergence issues through the use of Segment Identifiers (SID) for Segment Routing. These had several advantages in convergence over the LDP learned labels. Segment Identifiers (SID) were propagated within IGP routing updates of both OSPF and ISIS. This was far superior to the hop-by-hop convergence of LDP, which converged after the IGP came up and was known to cause issues.

Figure 4. MPLS Header Explained

We typically utilize Virtual Extensible LAN (VXLAN) in enterprise networks within Fabrics. VXLAN is an encapsulation protocol for tunneling data packets to transport original data packets, unchanged, across the network. This protocol-in-protocol approach has been used for decades to allow lower-layer or same-layer protocols (from the OSI model) to be carried through tunnels creating Overlay like pseudowires used in xConnect.

VXLAN is a MAC-in-IP encapsulation method.  It provides a way to carry lower-layer data across the higher Layer 3 infrastructure.  Unlike routing protocol tunneling methods, VXLAN preserves the original Ethernet header from the original frame sent from the endpoint.  This allows for the creation of an Overlay at Layer 2 and at Layer 3, depending on the needs of the original communication.  For example, Wireless LAN communication (IEEE 802.11) uses Layer 2 datagram information (MAC Addresses) to make bridging decisions without a direct need for Layer 3 forwarding logic.

Figure 5. Fabric VXLAN (VNI) Encapsulation Overhead

Any encapsulation method is going to create additional MTU (maximum transmission unit) overhead on the original packet.  As shown in figure 5 above, VXLAN encapsulation uses a UDP transport.  Along with the VXLAN and UDP headers used to encapsulate the original packet, an outer IP and Ethernet header are necessary to forward the packet across the wire.  At a minimum, these extra headers add 50 bytes of overhead to the original packet.

Cisco SD-Access and VXLAN

Cisco SD-Access places additional information in the Fabric VXLAN header, including alternative forwarding attributes that can be used to make policy decisions by identifying each Overlay network using a VXLAN network identifier (VNI).  Layer 2 Overlays are identified with a VLAN to VNI correlation (L2 VNI), and Layer 3 Overlays are identified with a VRF to VNI correlation (L3 VNI).

Figure 6. Fabric VXLAN Alternative Forwarding Attributes

As you may recall, Cisco TrustSec decoupled access that is based strictly on IP addresses and VLANs by using logical groupings in a method known as Group-Based Access Control (GBAC).  The goal of Cisco TrustSec technology was to assign an SGT value to the packet at its ingress point into the network.  An access policy elsewhere in the network is then enforced based on this tag information. As an SGT is a form of metadata and is a 16-bit value assigned by ISE in an authorization policy when a user, device, or application connects to the network, we can encode (SGT value and VRF values) into the header and carry them across the Overlay. Carrying the SGT within the VXLAN header allows us to utilize it for egress enforcement anywhere in the network and provides Micro and Macro Segmentation capability.

Figure 7. VXLAN-GBP Header

Cisco SD-Access Fabric uses the VXLAN data plane to transport the full original Layer 2 frame and uses LISP as the control plane to resolve endpoint-to-location (EID-to-RLOC) mappings. Cisco SD-Access Fabric replaces sixteen (16) of the reserved bits in the VXLAN header to transport up to 64,000 SGTs using a modified VXLAN-GPO, sometimes called VXLAN-GBP which is backward compatible with RFC 7348.

BGP-EVPN and VXLAN

VXLAN is defined in RFC 7348 as a way to Overlay a Layer 2 network on top of a Layer 3 network. Each Overlay network is called a VXLAN segment and is identified using a 24-bit VXLAN network identifier, which supports up to 16 million VXLAN segments. Without the Cisco modifications to VXLAN, the IETF format would not support SGTs within the header, which would preclude the use of egress enforcement and Micro-Segmentation without forwarding the packet to an enforcement device like a firewall (router on a stick) or deploying downloadable ACL, which add additional load to the TCAM.

Figure 8. IETF VXLAN Header

Fabric Benefits


When we start to review the various benefits of one Fabric design over the other, there are capabilities that differentiate them. Each Fabric design has something to offer and plays to its strengths. It’s important to clearly understand what benefit you can have from a technology and what the technology solves for you. In this section, we will look at what problems can be solved with each design.

Deploying a Fabric architecture provides the following advantages:

◉ Scalability — VXLAN provides Layer 2 connectivity, allowing for infrastructure that can scale to 16 million tenant networks. It overcomes the 4094-segment limitation of VLANs. This is necessary to address today’s multi-tenant cloud requirements.

◉ Flexibility — VXLAN allows workloads to be placed anywhere, along with the traffic separation required, in a multi-tenant environment. The traffic separation is done by network segmentation using VXLAN segment IDs or VXLAN network identifiers (VNIs). Workloads for a tenant can be distributed across different physical devices, but they are identified by their respective Layer 2 VNI or Layer 3 
VNI.

◉ Mobility — IP Mobility within the Fabric and IP address reuse across the Fabric.

◉ Automation — Various methods may be used to automate and orchestrate the Fabric deployment from a purpose-built controller to Ansible, NSO, and Terraform, thereby alleviating some of the problems with error-prone manual configuration.

Cisco SD-Access

This Fabric technology has many additional benefits that come with its deployment. Cisco SD-Access is built on an Intent-based Networking foundation that encompasses visibility, automation, security, and simplification. Using Cisco DNA Center automation and orchestration, network administrators can implement changes across the entire enterprise environment through an intuitive, GUI-based interface. Using that same controller, they can build enterprise-wide Fabric architectures, classify endpoints for security grouping, create and distribute security policies, and monitor network performance and availability.

SD-Access secures the network at the macro- and micro-segmentation level using Virtual Routing and Forwarding (VRFs) tables and Security Group Tags (SGTs), respectively. This is called Multi-Tier Segmentation, which is not optimal in traditional networks. This segmentation happens at the access port level. This means the security boundary is pushed to the very edge of the network infrastructure for both wired and wireless clients.

With Multi-Tier Segmentation, network administrators no longer have to undertake configurations in anticipation of a user or device move, as all of the security contexts associated with a user or device are dynamically assigned when they authenticate their network connection. Cisco SD-Access provides the same security policy capabilities whether the user or device is attached via a wired or wireless medium, so secure policy consistency is maintained as the user or device changes the attachment type.

Instead of relying on IP-Based security rules as in a traditional network, Cisco SD-Access relies on centralized group-based security rules utilizing SGTs that are IP-address agnostic. As a user or device moves from location to location and changes IP addresses, their security policy will remain the same as their group membership is unchanged regardless of where they access the network. This reduces pressure on network administrators since they do not have to create as many rules or manually update them on different devices. This, in turn, leads to a more dynamic, scaleable, and stable environment for network consumers without reliance on older technologies like PVLANs or constraints of introducing a bottleneck for enforcement.

How can a network be both dynamic and stable at the same time? When a rule does have to be created or changed, it can be done for all users of a group in the Cisco DNA Center. Those rules are then dynamically populated to all relevant network devices that need that rule, ensuring both accuracy and speed for the update. Additionally, wired and wireless network devices may be managed from one automation and orchestration manager, allowing the same rules, policies, and forwarding methods to be adopted across the entire network. With the addition of PxGrid integrations with ISE, the security policies can be adopted by almost any security-enabled platform to dramatically simplify policy enforcement and manageability problems surrounding maintaining ACLs.

When we analyze the solution more deeply and are objective, it is important to understand how the control plane functions and what the ultimate limitations might be of any technology. When a MAC move occurs, and an endpoint (or host) has moved from one port to another. The new port may be within the same edge node, or in a different edge node, in the same VLAN. Each edge node has a LISP control-plane session with all control plane nodes. After an endpoint is detected by the edge node, it is added to a local database called the EID table.  Once the host is added to this local database, the edge node also issues a LISP map-register message to inform the control plane node of the endpoint, so the central HTDB is updated. A host may move several times, so each time a move occurs, the HTDB is updated.

Thus there is never a case where the Fabric has the same entry on two edge nodes because this HTDB is utilized as a reference point for Endpoint Tracking when packets are forwarded. Each register message from the edge node includes an EID-RLOC entry for the endpoint, which is a combination of an Endpoint IDentifier (EID) to Resource LOCator (RLOC) mapping. Within LISP, edge nodes would have a management IP or RLOC to identify them individually. As a result, when an edge node receives a packet, it checks its local database for an EID-RLOC entry. If the EID-RLOC entry does not exist, a query is sent to the LISP control plane so the EID may be resolved to the RLOC. This EID-RLOC entry is the mapping of an RLOC to an Endpoint Identifier. Packets and frames received from the endpoint, either directly connected to an edge node or through it by way of an extended node or access point, are encapsulated in Fabric VXLAN and forwarded across the Overlay.  Traffic is sent to another edge node or the border node, depending on the destination. When Fabric encapsulated traffic is received for the endpoint, such as from a border node or another edge node, it is de-encapsulated and sent to that endpoint.  This encapsulation and de-encapsulation of traffic enable the location of an endpoint to change, as the traffic can be encapsulated towards different edge nodes in the network without the endpoint having to change its address. Additionally, the local database on the receiving edge node is automatically updated during this conversation for the reverse traffic flow. As we mentioned, this conversational learning is precisely that. The updates occur as traffic is forwarded from one switch to another on an as-needed basis. Lastly, most customers want to simplify the management of the network infrastructure but then are looking for the “One ring to rule them all, one ring to find them, One ring to bring them all”, in some sort of Single Pane of Glass. Networking is expansive, with each vendor having its own management platform, and each comes with various capabilities. DNA Center, from a Cisco perspective, allows for the automation and orchestration of Fabrics and Traditional networks from one platform, bringing the power to all of our Enterprise Networking portfolio, but integrating with ISE, Viptela, Meraki, and externally an Ecosystem of products like DNA Spaces, ServiceNow, Infoblox, Splunk Tableau and so many more. Additionally, you can bring your own Orchestrator and orchestrate through DNA Center, which allows organizations to adopt an Infrastructure as Code methodology.

To recap, there are three primary reasons which make it superior to traditional network deployments:

◉ Complexity reduction and operational consistency through orchestration and automation
◉ Multi-Tier Segmentation which includes group-based policies, and partitioning at Layer 2 and Layer 3.
◉ Dynamic policy mobility for wired and wireless clients
◉ IP subnet pool conservation across the SD-Access Fabric.

BGP-EVPN

BGP EVPN VXLAN can be used as a Fabric technology in a campus network with Cisco Catalyst 9000 Series Switches running Cisco IOS XE software. This solution is a result of proposed IETF standards and Internet drafts submitted by the BGP Enabled ServicesS (bess1) workgroup. It is designed to provide a unified Overlay network solution and also address the challenges and drawbacks of existing technologies proposed BGP to carry Layer 2 MAC and Layer 3 IP information simultaneously. BGP incorporates Network Layer Reachability Information (NLRI) to achieve this. With MAC and IP information available together for forwarding decisions, routing and switching within a network are optimized. This also minimizes the use of the conventional “flood and learn” mechanism used by VXLAN and allows for scalability in the Fabric. EVPN is the extension that allows BGP to transport Layer 2 MAC and Layer 3 IP information. This deployment is called a BGP EVPN VXLAN Fabric (also referred to as VXLAN fabric).

This solution would provide a Fabric comprised of Industry standards-based protocols, which provided a unified Fabric across Campus and Data Centers. Additionally, this Fabric would be interoperable with 3rd party devices in that it would allow for multi-vendor support and, at the same time, be Brownfield-friendly. Additionally, it would allow for rich multicast support with Tennant Routed Multicast and both L2 and L3 support.

This solution also may be deployed and managed by various automation and orchestration methods, from Ansible, Terraform, and Cisco’s NSO platform. While these platforms do offer robust automation and orchestration methods, they do not have the monitoring capability to look at model-driven telemetry. Additionally, they do not tie the richness of Artificial Intelligence and Machine Learning into the solution for help with Day N operations like troubleshooting and faultfinding, and visibility into both the user and application experience requires a separate platform. This often means standing up a separate platform for some sort of visibility, but they are separate and not combined.

When we analyze the solution more deeply and are objective it is important to understand how the control plane functions and what the ultimate limitations might be of any technology. When a MAC move occurs, and an endpoint (or host) moves from one port to another. The new port may be within the same VTEP, or in a different VTEP, in the same VLAN. The BGP EVPN control plane resolves such moves by advertising MAC routes (EVPN route type 2). When an endpoint’s MAC address is learned on a new port, the new VTEP it is in advertises (on the BGP EVPN control plane) that it is the local VTEP for the host. All other VTEPs receive the new MAC route. A host may move several times, causing the corresponding VTEPs to advertise as many MAC-based routes. There may also be a delay between the time a new MAC route is advertised and when the old route is withdrawn from the route tables of other VTEPs, resulting in two locations briefly having the same MAC route. Here, a MAC mobility sequence number helps decide the most current of the MAC routes. When the host MAC address is learned for the first time, the MAC mobility sequence number is set to zero. The value zero indicates that the MAC address has not had a mobility event, and the host is still at the original location. If a MAC mobility event is detected, a new Route type 2 (MAC or IP advertisement) is added to the BGP EVPN control plane by the new VTEP below which the endpoint moved (its new location). Every time the host moves, the VTEP that detects its new location increments the sequence number by 1 and then advertises the MAC route for that host on the BGP EVPN control plane. On receiving the MAC route at the old location (VTEP), the old VTEP withdraws the old route. A case may arise in which the same MAC address is simultaneously learned on two different ports. The EVPN control plane detects this condition and alerts the user that there is a duplicate MAC. The duplicate MAC condition may be cleared either by manual intervention, or automatically when the MAC address ages out on one of the ports. BGP EVPN supports IP mobility in a similar manner to the way it supports MAC mobility. The principal difference is that an IP move is detected when the IP address is learned on a different MAC address, regardless of whether it was learned on the same port or a different port. A duplicate IP address is detected when the same IP address is simultaneously learned on two different MAC addresses, and the user is alerted when this occurs. The number of entries is a bit of a concern primarily because as we start to deal with mobility, and as endpoints move around the network, these prefixes being learned and withdrawn puts a strain on the network from a churn perspective. As this occurs, the upper protocols must converge, and as that happens, CPUs can hit their limits. It’s important to understand the scope of the number of endpoints within the network and accommodate this in the design accordingly, especially when dealing with dual-stack networks utilizing IPv4 and IPv6. Additionally, the design must consider, especially for the routed access approach, the number of entries on the access switches and the performance impact thousands of wireless devices moving across the network might have. The last implication of withdrawing routes by sequence number is that it takes time for convergence; this should not be underestimated. Segmentation is provided by Private VLANs. A private VLAN (PVLAN) divides a regular VLAN into logical partitions, allowing limited broadcast boundaries among selected port groups on a single Layer 2 Ethernet switch. The single Ethernet switch’s PVLAN capabilities can be extended over the BGP EVPN VXLAN, enabling the network to build a partitioned bridge domain between port groups across multiple Ethernet switches in the BGP EVPN VXLAN VTEP mode. The integration of PVLAN with a BGP EVPN VXLAN network enables the following benefits:

◉ Micro-segmented Layer 2 network segregation across one or more BGP EVPN VXLAN switches.

◉ Partitioned and secured user-group Layer 2 network that limits communication with dynamic or static port configuration assignments.

◉ IP subnet pool conservation across BGP EVPN VXLAN network while extending segregated Layer 2 network across the Fabric.

◉ Conservation of Layer 2 Overlay tunnels and peer networks with a single virtual network identifier (VNI) mapped to Primary VLAN.

Source: cisco.com