Two years ago we set out on a quest to tune Cisco Network Services Orchestrator (NSO) for massive deployments. The primary challenge was the transaction throughput since no one wants a network that is slow or non-responsive. Customers will shout before you know it “Make your code run faster” or “My system is hanging”.
Today we are happy to announce that we have a significant performance boost for you. I almost dare to say that NSO 6.0 is “The Perfected Sword.” The magic is within the NSO Release 6.0 and the reimagined Transaction Manager. When we started the project we knew that it was our best attribute that was our greatest enemy, as well as our biggest potential. We were challenged as we had to perfect something that made us who we are. Now we are proud to claim that you will get three (3) times faster transaction throughput by only upgrading SW, and up to nine (9) times faster if you engage in optimization. If you are new to NSO and don’t care about the history, you can stop reading now, and enjoy the new version!
For those of you who have been with us for a while, or maybe struggled to scale with NSO, I will add a few layers to the history. If you want to know even more and get hands-on, sign-up for our next Automation Developer Days, Nov 29-30 in New York!
Shaping NSO for Increasing Demand
With an ever-growing network demand, we knew we had to be radical. Future networks need to push through more transactions per second than ever before. Our attempts to help customers optimize their code inside the lock were not enough. We knew about the opportunities to increase the concurrency and performance if we can reduce the time we protect transactions (a.k.a code lock). It would simply let us use the processing power more efficiently.
Things we did in the protected phase.
◉ FASTMAP create-code. Can be more or less efficient.
◉ Validations are model-driven constructs such as must, when, leafref, etc. These can be time-consuming.
◉ Kicker evaluations can be more or less efficient
◉ Device communication is normally time-consuming.
A transaction in NSO 5.x and earlier
It was tough to realize, but the merits that make NSO so unique also can impact performance at scale. We cannot expect users to write perfect validation expressions just because we know how. We also understood that we could not achieve sufficient gain unless we challenge the NSO heritage and break the transaction integrity, just enough to release the power. That is what makes our transactions fail-safe and also prevents some level of parallelism.
Can we run without locks or can we make the lock shorter? We need to manage any code that runs unprotected without adding too much complexity that eats up the cycles on the other side.
The New Concurrency Model
We put a lot of research behind the new design and the parts that control concurrency. The Transaction Manager is the central piece of this project. It is a specific function outside the database (CDB) that contains all functionality necessary for e.g FASTMAP.
The Transaction Manager controls the concurrency in NSO.
We knew that we could do much more in parallel if we can apply “checking” instead of “locking”. We just need to verify that the create condition is still valid when we apply “commit”. Service invocations, Validations, Rollback file creation and more could potentially run outside the lock if we find a way to detect interference. We went from a pessimistic view of the transaction to an optimistic view to optimize concurrency.
A transaction in NSO 6.0
Conflict detection is one way to verify the conditions at commit and the basis for our new programming paradigm. We basically compare the current transaction read-set to other completed transaction write-sets. If some transaction has changed what the current transaction read, then the current transaction must abort and the services restarted. In this way, we protect existing services from being rewritten. Pretty straightforward, right? Of course, if you do your part to ensure your code is conflict-free you will avoid service restarts and NSO can run full speed.
Another less surprising example is the Commit Queue Option which proved to be very useful for moving device communication outside the lock removing dependencies.
Unexpected Outcomes
The Transaction Manager is probably one of the more well-tested code sections in NSO for a reason. Changing the core architecture can of course be risky. When you start poking around you will have to roll up your sleeves and fix old bugs as you run into them. The upside can be equally motivating as unexpected gains materialize.
◉ Lockless dry-run is one of them. The dry-run transactions will never enter the critical section, not even in LSA. It affects most actions with the dry-run option as well as service check-sync, get-modification, and deep-check-sync.
◉ Improved device locking is another one that allows us to obsoletes the wait-for-device commit parameter. The devices are locked automatically before entering the critical section which simplifies both code and operations.
◉ Improvements backported to the NSO 5.x branch
◉ Improved commit queue error recovery
◉ Internal performance improvements in CDB
◉ Performance Improvement for kicker evaluation
Sometimes it Pays Off to Dare a Little More
Sometimes it is worth trying the more advanced path to reach a certain goal. When you know it works you can simplify and evaluate. Now we challenge you to upgrade to NSO 6.0 and optimize your SW for faster transaction throughput. To learn more I highly recommend the new Packet Pusher podcast that uncovers the new features in NSO 6.0. As the next step, come to Developer Days in New York in November if you want to know more about the details and how you can gain performance with NSO 6.0. You will dive deeper into this topic in hands-on coding sessions led by our experts. If you can’t come to New York or want to come prepared you can always check out the NSO YouTube Channel for the latest content. We have two particular sessions on the new concurrency model from our previous event in Stockholm. One overview session explains what we have done and one session is a deep dive that focuses on the conflict detection algorithm.
Source: cisco.com
0 comments:
Post a Comment