This is a question the Technology Experiences Team (TechX), Cisco’s dedicated team of infrastructure engineers and project managers, asked themselves this year. When our annual, in-person conference suddenly went virtual, it rendered our hardware a little redundant. So, what do we do with the technology we’d usually deploy for our customers at events?
TechX is chartered with the support of events and trade shows throughout the calendar year. It is our fun and often exhilarating task to implement Cisco’s technologies and sometimes our very latest solutions. Supporting our customers, event staff, and partners to host Cisco Live, and building an enterprise class network for 28,000+ people in just a few days is certainly an undertaking.
With no physical events this year, all that amazing Cisco technology is suddenly useless, right? Well, fortunately not. My job within the team is to build out and support the Data Center (DC) for our shows. The DC is home for all those applications that make the event and supporting it a success. Our applications portfolio includes: Cisco Identify Services Engine (ISE), Cisco Prime Network Registrar (CPNR – DNS/DHCP), Cisco DNA Center, virtual Wireless LAN controllers, FTP, Cisco Network Services Orchestrator (NSO), Data Center Network Manager (DCNM), vCenter, various flavors of Linux based on our Engineers preference, NTP, Active Directory, Remote Desktop Services, Application Delivery Controllers (ADC), Cisco Video Surveillance Manager, Grafana, NetApp Snap Center, Ansible hosts, Mazemap Lipi server, Find my Friends server, web hook servers, Database hosts, and the list goes on.
What did we do with a DC that supports all of those wonderful applications you may well ask? Well, we did two things. First we deployed Folding@home virtual machines, which as many of you well know is a distributed network of compute power using almost any machine to crunch numbers, helping scientists at Stanford University work toward cures for diseases. What better use of a large Data Center? Not only are we repurposing our infrastructure instead of retiring it, we’re doing our part to help with a healthcare crisis. In fact, Cisco as a whole is using its compute power across the company to contribute, and you can see our progress with the Folding@home project. Cisco’s team ID is 1115, and our group is called CiscoLive2016, as that’s the first time we deployed Folding@home during that very show.
Other important questions arise from this such as:
◉ What are we using to host Folding@home?
◉ How did we deploy the virtual machines?
◉ How are we monitoring our compute?
◉ How do we monitor our progress in terms of the Folding@home project?
We deploy two types of compute cluster at Cisco Live, one traditional data center solution with storage and blade servers (UCS B series), known as a Flexpod. The second, a hyperconverged cluster known as Cisco Hyperflex. The Flexpod is a collaborative solution that comprises VMware’s vSphere virtualization software, NetApp’s storage clusters, Cisco’s UCS Blade Servers, and Nexus Data Center switches. In this case we’re using UCS B200 M4 split over two chassis combined with a NetApp MetroCluster IP for a total of 16 Blades. The Metro cluster is a fully redundant storage system that replicates all data between two arrays. As such, if you lose one, the other will allow you to recover your lost data. Typically, these are installed at two different locations, which isn’t possible at Cisco Live due to space and cabling restrictions. You’ll see how we configure it below.
The MetroCluster actually ships with two Nexus 3232C switches to create the IP connectivity between both clusters. The UCS Chassis uses a boot from SAN method, to load their ESXi OS from the Metro Cluster IP. Due to UCS’s service profiles, if we were to lose a blade, we may simply replace the blade and boot the exact same operating system, used by the old host, without the need to re-install ESXi. A service profile is essentially a set of variables that make a host or server operable. These variables include UUID, MAC address, WWPN’s and many other pieces of information. When we insert a new blade it would take on the appearance of the fold blade using the information created within the profile. This allows it to masquerade as the old host and permits a compute hotswap. Here’s a basic diagram of our design.
Flexpod Design Diagram
TechX is chartered with the support of events and trade shows throughout the calendar year. It is our fun and often exhilarating task to implement Cisco’s technologies and sometimes our very latest solutions. Supporting our customers, event staff, and partners to host Cisco Live, and building an enterprise class network for 28,000+ people in just a few days is certainly an undertaking.
With no physical events this year, all that amazing Cisco technology is suddenly useless, right? Well, fortunately not. My job within the team is to build out and support the Data Center (DC) for our shows. The DC is home for all those applications that make the event and supporting it a success. Our applications portfolio includes: Cisco Identify Services Engine (ISE), Cisco Prime Network Registrar (CPNR – DNS/DHCP), Cisco DNA Center, virtual Wireless LAN controllers, FTP, Cisco Network Services Orchestrator (NSO), Data Center Network Manager (DCNM), vCenter, various flavors of Linux based on our Engineers preference, NTP, Active Directory, Remote Desktop Services, Application Delivery Controllers (ADC), Cisco Video Surveillance Manager, Grafana, NetApp Snap Center, Ansible hosts, Mazemap Lipi server, Find my Friends server, web hook servers, Database hosts, and the list goes on.
What did we do with a DC that supports all of those wonderful applications you may well ask? Well, we did two things. First we deployed Folding@home virtual machines, which as many of you well know is a distributed network of compute power using almost any machine to crunch numbers, helping scientists at Stanford University work toward cures for diseases. What better use of a large Data Center? Not only are we repurposing our infrastructure instead of retiring it, we’re doing our part to help with a healthcare crisis. In fact, Cisco as a whole is using its compute power across the company to contribute, and you can see our progress with the Folding@home project. Cisco’s team ID is 1115, and our group is called CiscoLive2016, as that’s the first time we deployed Folding@home during that very show.
Other important questions arise from this such as:
◉ What are we using to host Folding@home?
◉ How did we deploy the virtual machines?
◉ How are we monitoring our compute?
◉ How do we monitor our progress in terms of the Folding@home project?
What are we using to Host Folding@home?
We deploy two types of compute cluster at Cisco Live, one traditional data center solution with storage and blade servers (UCS B series), known as a Flexpod. The second, a hyperconverged cluster known as Cisco Hyperflex. The Flexpod is a collaborative solution that comprises VMware’s vSphere virtualization software, NetApp’s storage clusters, Cisco’s UCS Blade Servers, and Nexus Data Center switches. In this case we’re using UCS B200 M4 split over two chassis combined with a NetApp MetroCluster IP for a total of 16 Blades. The Metro cluster is a fully redundant storage system that replicates all data between two arrays. As such, if you lose one, the other will allow you to recover your lost data. Typically, these are installed at two different locations, which isn’t possible at Cisco Live due to space and cabling restrictions. You’ll see how we configure it below.
The MetroCluster actually ships with two Nexus 3232C switches to create the IP connectivity between both clusters. The UCS Chassis uses a boot from SAN method, to load their ESXi OS from the Metro Cluster IP. Due to UCS’s service profiles, if we were to lose a blade, we may simply replace the blade and boot the exact same operating system, used by the old host, without the need to re-install ESXi. A service profile is essentially a set of variables that make a host or server operable. These variables include UUID, MAC address, WWPN’s and many other pieces of information. When we insert a new blade it would take on the appearance of the fold blade using the information created within the profile. This allows it to masquerade as the old host and permits a compute hotswap. Here’s a basic diagram of our design.
Flexpod Design Diagram
How are we monitoring our Compute?
The other awesome thing about Cisco’s compute platform is we have a cloud-based monitoring system called Cisco Intersight. We use this each year to ensure our servers are running without error. You may also access the servers’ management interfaces, UCS Manager, from Intersight, making it a consolidated GUI across multiple sites or deployments. Here’s a Dashboard screen capture of how that looks. We actually have an error on one host which I need to investigate further. It’s great to have a monitoring system, especially whilst we’re all working from home.
How did we deploy the Virtual Machines?
Being a busy guy, I didn’t want to manually deploy all 40 virtual machines (VMs), carrying out a lot of error prone typing of host names, IP addresses and VM specific parameters. Bearing in mind, there would be a great deal of repetition as each VM is essentially the same. Instead I decided to automate the deployment of all the VMs. The great news is, some of the work has already been done as VMware themselves have produced a Folding@home ‘ova’ image running their Photon OS. The image is optimized to run on ESXi and can be installed using ova/ovf parameters. These are basically settings, such as IP address, hostname and information specific to the Folding@home software install taken prior to installation. There are some installation posts regarding deployment and also in the download itself. Please see the link at the end of this post.
Using Python scripting and VMware’s ovftool, a command line tool for deploying ovf/ova files, I was able to take the image and pass all the ova parameters to the ovftool. The ovftool then actually builds a VM on a specified host taking all of your desired settings. Using Python, I can loop over all of these parameters x number of times, in my case forty, and execute the ovftool command forty times. This was a joy to watch, as VM’s started to appear in my vCenter all of a sudden and I could sit back and drink my cappuccino.
After the installation I was able to monitor, using VMware’s vCenter how our hosts were running. Using Folding@home’s largest VM’s installation, which uses more processing power, I was able to push our cluster to around 75% CPU utilization on each host as can be seen below. Some hosts were spiking a little, so I needed to make some adjustments, but we continued to crunch numbers and use our otherwise idle compute for a greater good.
How do we monitor our progress in terms of the Folding@home project?
Digging into Folding@home, I was able to learn the project has an Application Programming Interface or API. The API allows access to the statistics programmatically. Again, using Python alongside InfluxDB and Grafana, I was able to create a dashboard that the team could view in order to monitor our progress. Here’s a sample that I’ve annotated with numbers so we can refer to each statistic individually.
1. Teams work units, the amount of data crunched over time
2. The score assigned to our team over time
3. Cisco System’s group position out of all companies contributing to the project
4. Within the Cisco Systems group, our own position within the project
5. TechX work units as a numerical value
6. TechX’s Score as a numerical value
I was going to go into what we used our Hyperflex for, but I may leave that to another article as this one is getting a little long!
0 comments:
Post a Comment