As a junior network engineer at a university I wrote a lot of management scripts in Perl. I had scripts to do things such as check switchport configurations and upgrade switch code. Times have changed a lot since then. The university’s web server now runs in the cloud, rather than on my personal workstation, and Python has surpassed Perl as the scripting language du jour.
Network automation with Python is now a major focus, making Python an extremely important tool.
Today I’m going to show you how to use Python scripts hosted on the box and integrated into IOS. This is far more powerful than my earlier-career scripts, and I have some simple examples for PCI compliance, Dynamic DNS ACL updates, and configuration validation.
As with many things in IT, we seem to be continually oscillating between “centralized” and “distributed.” On-box hosting of Python scripts is an example of moving back toward distributed. My view on the argument is that it’s never about the extremes, but more about the balance—a bit like a pendulum constantly swinging as technology advances change what’s possible and practical.
Today, I want to demonstrate why Python scripts running on-box can be awesome. I also want to explain how easy it is, based on the application hosting environment we’ve just released. In addition, I’ll give some examples of how Python is even more powerful when combined with some of the existing IOS infrastructure such as Embedded Event Manager (EEM).
Why on-box Python?
There are three main advantages for Python scripts running on the device itself rather than externally.
◈ Scale: If I have a “sanity” script that I need to run regularly and it takes six seconds per device * 1000 devices, that would be 6,000 seconds, or one hour and 40 minutes.I could run them in parallel, but that still consumes resources on my management station, processing the data. It also potentially transports lots of data back to the management station, only to discard much of it. An alternative is to distribute the work to the devices and get them to provide an update when they’re done.
◈ Security: Instead of having “utility” logins that connect into devices and export information for external processing, you can have the device process its data locally and just export the summary state. Data stays on the device, and less external connections are required.
◈ Autonomy: The biggest limitation of centralized processing is that it needs a network connection to the device. There is a set of use cases to modify device behaviour when it loses connectivity to other devices. This can only be done on-box.
To illustrate the “what,” I’ve provided some sample scripts and use cases for the three points above. The code is published @ https://github.com/aradford123/on-box-python.git
To get started, we’re going to want to make sure Git is installed on your network device, run the following commands (after you’ve enabled guestshell; see section at the end for details). The reason for using /flash/gs_script is that it’s a persistent directory and will be available on a switch stack switchover.
# install git
[guestshell@guestshell ~]$ sudo yum install git
# now install scripts into /flash/gs_script
[guestshell@guestshell ~]$ git clone https://github.com/aradford123/on-box-python.git /flash/gs_script
Example 1 – PCI compliance
Here’s an example of a use case that I think you’ll find interesting. One of our customers had a PCI requirement to ensure that any switch ports that were unused for more than seven days were disabled. (This was to prevent people from plugging in unauthorized devices.)
The script looks at all interfaces on the switch, and those that have been inactive (no traffic send/received) for more than seven days are shutdown. All interfaces that were shut down in a logged in a Cisco sparkroom. The interface description is updated with a message indicating the time/date it was shutdown by the PCI-check application.
To run the script, we’ll use the Embedded Event Manager. EEM is a really powerful piece of IOS infrastructure that can be used to schedule the Python script to run. The EEM cron job runs the Python script at 15 minutes past the hour, Monday to Friday.
event manager applet PCI-check
event timer cron cron-entry "15 * * * 1-5"
action 1.0 cli command "enable"
action 1.1 cli command "guestshell run python bootflash:gs_script/src/pci-tool/pci_check.py --apply"
This script can now be run hourly (instead of weekly). It’s an example of scaling using on-box Python.
Example 2 – DNS ACL
Another customer had a requirement to keep an ACL updated with the latest DNS entries. For example, they wanted the ACL to reflect the real IP addresses of www.cisco.com and www.amazon.com.
This script automatically schedules its next execution based on the minimum Time To Live (TTL) of the DNS response. It ensures we don’t attempt to update more often than necessary.
The script logs entries that are added to the ACL via syslog, but that could be Cisco Spark or any other notification mechanism.
show logging
*Jul 12 11:44:11.174: %SYS-5-USERLOG_NOTICE: Message from tty4(user id: ): "Looking up cisco.com"
*Jul 12 11:44:11.253: %SYS-5-USERLOG_NOTICE: Message from tty4(user id: ): "Looking up amazon.com"
*Jul 12 11:44:11.341: %SYS-5-USERLOG_NOTICE: Message from tty4(user id: ): "adding IP: 72.163.4.161 to ACL: status: Success"
*Jul 12 11:44:11.351: %SYS-5-USERLOG_NOTICE: Message from tty4(user id: ): "adding IP: 54.239.25.208 to ACL: status: Success"
*Jul 12 11:44:11.360: %SYS-5-USERLOG_NOTICE: Message from tty4(user id: ): "adding IP: 54.239.17.6 to ACL: status: Success"
*Jul 12 11:44:11.370: %SYS-5-USERLOG_NOTICE: Message from tty4(user id: ): "adding IP: 54.239.26.128 to ACL: status: Success"
*Jul 12 11:44:11.379: %SYS-5-USERLOG_NOTICE: Message from tty4(user id: ): "adding IP: 54.239.17.7 to ACL: status: Success"
*Jul 12 11:44:11.389: %SYS-5-USERLOG_NOTICE: Message from tty4(user id: ): "adding IP: 54.239.25.200 to ACL: status: Success"
*Jul 12 11:44:11.398: %SYS-5-USERLOG_NOTICE: Message from tty4(user id: ): "adding IP: 54.239.25.192 to ACL: status: Success"
*Jul 12 11:44:17.605: %SYS-5-USERLOG_NOTICE: Message from tty5(user id: ): "reschedule in : 557 seconds: status: Success"
Here is the resulting ACL. Notice how the remarks are used to indicate the time the ACL changed. The last two entries were added at a later date.
9300#show run | sec canary
ip access-list extended canary_ip_in
remark Added 72.163.4.161 @Wed Jul 12 11:44:11 2017
permit ip any host 72.163.4.161
remark Added 54.239.25.208 @Wed Jul 12 11:44:11 2017
permit ip any host 54.239.25.208
remark Added 54.239.17.6 @Wed Jul 12 11:44:11 2017
permit ip any host 54.239.17.6
remark Added 54.239.26.128 @Wed Jul 12 11:44:11 2017
permit ip any host 54.239.26.128
remark Added 54.239.17.7 @Wed Jul 12 11:44:11 2017
permit ip any host 54.239.17.7
remark Added 54.239.25.200 @Thu Jul 13 16:34:37 2017
permit ip any host 54.239.25.200
remark Added 54.239.25.192 @Thu Jul 13 16:34:37 2017
permit ip any host 54.239.25.192
deny ip any any
This script uses a different type of EEM trigger, a countdown timer. The script self-updates the trigger based on the TTL of the DNS response. In the case below, it will fire in the next 557 seconds.
9300#show run | sec even
event manager applet DNS_update
event timer countdown time 557
action 1.0 cli command "enable"
action 1.1 cli command "guestshell run python bootflash:gs_script/src/dns-update/DNS_update.py cisco.com amazon.com"
This is an example of security using on-box Python. No external access to the device is required.
Example 3 – Configuration change
This example uses an EEM event to look for a syslog message and execute a Python script. In this case, it looks for a configuration event, and fires the script.
This script will do two things:
◈ A sanity check. This example is a simple test to see if an IP address is reachable, but it could be more sophisticated. If the sanity check fails, then the configuration is rolled back.
◈ Log the changes to the configuration in a spark room.
This screenshow shows the configuration diff posted to a spark room.
Here’s an example of the sanity check: I have a very simple sanitiy check that is looking for connectivity to 1.1.1.1. While this example is trivial, the sanity check could be much more sophisticated (checking for OSPF neighours, number of connected hosts, etc.).
I shut down the loopback address 1.1.1.1, which is being checked by the sanity function. The sanity function fails, triggering configuration rollback, and the current (working) configuration is restored.
9300(config)#int loopback 2
9300(config-if)#shut
9300(config-if)#exit
9300(config)#end
9300#
Jul 14 12:01:04.859: %SYS-5-CONFIG_I: Configured from console by vty0 (10.61.215.206)
Jul 14 12:01:15.648: %MLANG-3-LOG: config_check.py: Sanity
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 1.1.1.1, timeout is 2 seconds:
.....
Success rate is 0 percent (0/5)
Jul 14 12:01:15.690: Rollback:Acquired Configuration lock.
Jul 14 12:01:15.690: %SYS-5-CONFIG_R: Config Replace is Done
Jul 14 12:01:16.579: %MLANG-3-LOG: config_check.py:
Total number of passes: 1
Rollback Done
Jul 14 12:01:18.127: %LINEPROTO-5-UPDOWN: Line protocol on Interface Loopback2, changed state to up
Jul 14 12:01:18.127: %LINK-3-UPDOWN: Interface Loopback2, changed state to up
There are lots of other options for this script, including checking into a git repository and more enhanced sanity checks.
This Python script uses EEM in a different way. The first line of the script embeds an EEM registration. If there is a syslog message with pattern “CONFIG_I” in it, the script will be executed. (NOTE: This is actually a Python script, and the normal Python code is after this.)
::cisco::eem::event_register_syslog pattern "CONFIG_I" maxrun 60
# this is an example of an EEM policy trigger
# based of Joe Clarke version
#https://github.com/CiscoDevNet/python_code_samples_network/blob/master/eem_configdiff_to_spark/sl_config_diff_to_spark.py
I then tell EEM to look for the registration in the config_check.py script.
event manager directory user policy flash:
event manager policy config_check.py
This is an example of autonomy; if the configuration is changed, a sanity check is run to make sure the device is still functioning (and connected) to the network. If the sanity check fails, the configuration is rolled back.
Upgrade sanity check
It’s pretty simple to extend the use case above to check the status of the device before downloading and installing new software. Once the new code is installed, it will re-run the sanity check and either remove the old version of code or roll back depending on the status of the sanity check.
This is another example of autonomy using on-box Python.
How does this really work?
Python runs in a guestshell on the device. The guestshell is CentOS or Montevista shell running as an application on the device. In order to enable application hosting you need to use the application hosting framework IOX. To enable IOX is quite easy:
9300# conf t
9300(config)#iox
You then need to enable guestshell. This will take a few seconds.
9300# guestshell enable
Management Interface will be selected if configured
Please wait for completion
On switches running 16.8.1 and later, you need to configure a management interface, as shown below. This was added to allow access to guest shell from ports other than the mangement interface (GigabitEthernet0/0).
9300# conf t
app-hosting appid guestshell
app-vnic management guest-interface 0
end
9300# guestshell enable
Management Interface will be selected if configured
Please wait for completion
Once it has started, you can either run a command or get an interactive shell session.
9300#guestshell run echo "hello world"
hello world
9300#guestshell
[guestshell@guestshell ~]$
The very first thing you will do is update the DNS settings. You can use vi, or just a simple echo statement.
echo -e "nameserver 8.8.8.8\ndomain cisco.com" > /etc/resolv.conf
Now to install some Python modules. This is pretty simple. Just use pip install. I am using the “-E” option as my switch needs a proxy to get to the internet.
[guestshell@guestshell ~]$ sudo -E pip install netaddr
Collecting netaddr
Downloading netaddr-0.7.19-py2.py3-none-any.whl (1.6MB)
100% |################################| 1.6MB 257kB/s
Installing collected packages: netaddr
Successfully installed netaddr-0.7.19
DevOps
The next question you’re asking is how do I keep the scripts on the device updated? It would be a pain to have to copy/install new scripts all the time.
The solution is pretty simple. Store the scripts in a git repository (so you have full version control), then use an EEM script to “git pull” regularly to keep them updated.
Here’s a simple git update script:
[guestshell@guestshell ~]$ cat /flash/gs_script/utils/update_git.sh
#!/bin/bash
(cd /flash/gs_script; git pull)
All that’s required is another EEM cron job to keep the device updated with the latest git repository.
event manager applet GIT-sync
event timer cron cron-entry "0,30 * * * *"
action 1.0 cli command "enable"
action 1.1 cli command "guestshell run bootflash/gs_script/src/util/update_git/sh"