Thursday 18 June 2020

Webex Cloud-Connected UC: Add the Power of Cloud to Your On-Premises UC Systems

Cisco Tutorial and Material, Cisco Guides, Cisco Certification, Cisco Study Materials

Enabling the Workforce with Cisco Webex Cloud-Connected UC


The last ninety days have transformed business operations, globally. Enterprises are adjusting to the new “work from home” reality and are working around the clock to enable their workforce with the tools they need to be successful. Customers and partners are looking for an easy way to understand how their unified communications usage pattern is changing and how their network is handling the increased traffic load. The need for a single-pane view of business and operational insights is more important now than ever.

This is the first, of a two-part blog series where we will discuss how Cisco is meeting this need, through the introduction of Cisco Webex Cloud-Connected UC, a suite of cloud services that provides global business and operational visibility for our unified communications (UC) customers.

The Unified Communications Landscape


Cisco is the market share leader for on-premises UC, in large part because we have the broadest and deepest UC portfolio. We have helped many on-premises customers migrate their calling and messaging workloads to Cisco cloud offers – Cisco Webex Calling, Cisco Hosted Collaboration Solution (HCS), or Cisco Unified Communications Manager Cloud. For a variety of reasons, many customers intend to continue to operate their UC applications on-premises on their own or their partner’s data centers. We continue to invest and innovate to deliver compelling values for these customers.

Cisco Webex Cloud-Connected UC is one such innovation that enables on-premises customers to reap the benefits of cloud-delivered services while retaining their on-premises investments. For them, it is the best of both worlds. Powerful, agile, innovative management, delivered from the cloud, serving their own secure, reliable, private UC platform.

What is Cisco Webex Cloud-Connected UC?


Cisco Webex Cloud-Connected UC is a suite of cloud services that allow Cisco Unified Communications on-premises customers & partners to connect their UC deployments in their data centers to the Cisco Webex cloud and consume value-added services from the cloud. These services include:

1. Management services, that improve the experience of the UC administrator, and
2. UC supplemental services, that improve the experience of the enterprise end-users.

In this blog, we will focus on management services. We will cover supplemental services in part two of this series.

UC Management from Webex Control Hub


Our UC customers have been asking for an administrative tool that gives them a centralized, single-pane view of their entire deployments, across regions, in order to simplify their administration. Cisco and other industry vendors offer various UC management products. While these tools were fine for handling management requirements in simpler times, they fall short, when compared with the latest cloud technologies and the rapid innovation cycle offered by cloud services.

Cloud-Connected UC addresses this by centralizing and simplifying the on-premises management workflows on Cisco Webex Control Hub. Now, Control Hub can bring you the “single pane view” for on-premises deployments, in addition to the centralized management it already provides for Webex cloud and edge services.

The initial release of Cloud-Connected UC offers the foundational ability to connect on-premises systems to the cloud to get a rich, global cloud analytics service dashboard. We will build upon this foundation by adding various operational workflows to simplify the life of the UC administrator.

Cisco Tutorial and Material, Cisco Guides, Cisco Certification, Cisco Study Materials

Webex Cloud-Connected UC Analytics


Webex Cloud-Connected UC Analytics provides historical insights into business and operational metrics that will serve customer and partner IT administrators, as well as the IT buyer.  Cloud-Connected UC Analytics serves the IT mission by answering the following questions:

For IT Administrators:

◉ What is the operational health of my clusters & servers?
◉ What is the Quality of Experience for my users?
◉ Are there any issues I need to act upon (expired certificates, missing security fixes, etc.)?

For IT buyers:

◉ How are my assets being utilized? How can I optimize my asset usage?
◉ Are users adopting and engaging with new endpoints and features?
◉ Do I need to increase my service capacity to prepare for increased usage?

Tying back to the increased “work from home” usage situation that we started with, the analytics dashboard also provides enterprises the necessary insights to enable work from home usage and manage their users’ return to the office.

Cloud-Connected UC Analytics is built with intelligence to recognize patterns in the historical data, identify potential issues, recommend resolutions, and proactively notify the administrator. We are continuously exploring new ways to leverage cloud technologies, like Artificial Intelligence, Machine Learning, and Natural Language Processing to simplify the life of an administrator. The possibilities are practically unlimited.

Tuesday 16 June 2020

Smart Parking: A Cisco IoT Solution with LoRaWAN

Cisco Tutorial and Material, Cisco Guides, Cisco Learning, Cisco Exam Prep

I’m going to give you a behind the scenes look at the architecture of this small, but real, IoT application. It shows an easy way to get a digital output from an analog action. But first, let me introduce you to the problem and solution components.

Do you know the feeling? When you’re in a large parking garage and looking for an empty parking space? You are circling around with your car. Perhaps you’re late! You know there’s an empty spot somewhere. But where?!

There’s a Cisco IoT solution for that

Well, there is a Cisco IoT solution for that which we implemented for our e-parking spaces in our Cisco office in Frankfurt, Germany. There, we have 4 parking spaces where you can charge your e-car. That’s good, but 4 spaces are too few to meet demand, and can be occupied quite fast. To solve the problem we implemented a solution using LoRaWAN parking sensors. The solution helps our visitors and employees with the following:

Website

The website is where the user can check live data on what parking spaces are empty and occupied.

Cisco Tutorial and Material, Cisco Guides, Cisco Learning, Cisco Exam Prep

Web-Dashboard with historical data

By simply storing data in a time-series database (InfluxDB), the web-dashboard (Grafana) can showcase the number of parking processes per parking space and time/date. As you can see below, because of the Covid-19 crisis nobody went to the office the last months.

Cisco Tutorial and Material, Cisco Guides, Cisco Learning, Cisco Exam Prep

Proactive and reactive Webex Teams Bot

Users get notified via push-messages if only one parking space is still available and if all parking spaces are occupied. The same is also possible the other way around: Users can reactively ask the ParkingBot what parking spaces are empty or occupied.

Cisco Tutorial and Material, Cisco Guides, Cisco Learning, Cisco Exam Prep

Architecture & Behind the Scenes

In this scenario, the data is being sent from the LoRaWAN sensors to our Cisco IXM LoRaWAN gateway which is directly connected to the industrial router IR829. Both devices are managed by the IoT Field Network Director with zero touch deployment. Then, the sensor data is sent via the cellular network to the LoRaWAN network server Thingpark Enterprise (Cisco Partner Actility). The cellular connectivity of the IR829 is managed by the Cisco Control Center which is an industry-leading SaaS SIM-card management platform. The SIM card was provided by our partner KPN.

After decrypting the LoRaWAN sensor payload, the data is forwarded via MQTT to the Python script and to InfluxDB, where the sensor data is stored for long-term. The python script also orchestrates the Webex Teams bot notifications and serves as the back-end to the website for sending the latest parking information data. Grafana is directly connected to InfluxDB.

Cisco Tutorial and Material, Cisco Guides, Cisco Learning, Cisco Exam Prep

◉ Special thanks to Michael Eder who helped building this showcase application.

Saturday 13 June 2020

6 Essential Elements of Your Managed Detection and Response Lifecycle – Part 1

We’ve seen a sharp increase in the number of organizations growing their remote workforces over the last decade. In fact, at the start of 2020, the number of remote workers in the U.S. stood at 4.7 million, which represents 3.4% of the population.

The advent of cloud, multi-cloud, and hybrid cloud architectures has made it possible for businesses to rapidly adapt to changing workforces and working styles. However, these changes have also introduced new challenges in managing security operations.

The key reasons for this include:

◉ Workers are accessing organizations’ servers and applications remotely, which opens up new entry points for cyber attacks

◉ Employees are relying increasingly on cloud-hosted services to work and collaborate

◉ Remote workers are being targeted by more and more malware sites

◉ Employees fail to consistently practice good cyber hygiene

As the remote workforce grows and cyber threats stack up, its important organizations have the capability to manage risks and uncertainty to keep critical assets secure. Where risks are known, actions are clear. But with unknown risks, there needs to be a focus on disciplined research and investigation. This helps generate intelligence to develop detailed use cases, providing Security Operations (SecOps) teams with a guide to respond to threats.

By defining known and unknown risk scenarios in your security operations lifecycle, you can meet the demands of remote workers using cloud and network services, while ensuring you remain protected.

Let’s explore how to establish a six-phase threat detection and response methodology that addresses uncertainty.

Managing uncertainty with disciplined security operations


Cisco Prep, Cisco Tutorial and Materials, Cisco Learning, Cisco Exam Prep

Identify

Establishing a clear methodology for security operations teams to follow is a critical element of effective and efficient threat detection and response.

This methodology starts with identifying uses cases.  Uses cases are the definition and analysis of an attack method.  In addition to the type of attack, use cases include step-by-step detail on how an attack unfolds, e.g. exfiltration of data from an organization or compromised privileged login, as well as possible control points for use in mitigation. Establishing a methodology that SecOps then leverages to identify and create new use cases is crucial to ensuring the organization maintains a strong security posture.

Building a disciplined approach to use case identification and analysis is the foundation of your detection and response process; providing insights on use case relevancy and organizational asset protection effectiveness.

Without these insights you will lack the visibility needed to truly maximize the value of follow on process steps such as developing, evaluating, and enhancing.

Organizations that follow a defined methodology to discover, collect, refine, validate, and apply changes to use cases address a critical weakness in “set it and forget it” programs. These programs assume the security policies and use cases developed at the time of implementing advanced operations tools remain static – an assumption that can create broad gaps in your threat visibility.

Prioritize

Prioritizing use case development is very important given it directly impacts how fast your organization is ready to respond to specific threats.  It is often debated which use cases to do first, which are most important, and how to assess the lifecycle for additional use cases. While prioritization could be based on importance, you’re likely to be more effective balancing importance with feasibility (e.g. how complex and risky is the use case to implement) and the speed at which a particular business operates.

Establishing a model to prioritize use cases will help you manage this balance. One approach is to create relative categories. For example:

◉ ‘Control’ based use cases relate to a regulatory objective, such as Payment Card Industry Data Security Standard (PCI DSS)

◉ ‘Threat’ based use cases leverage threat intelligence related to Tactics, Techniques, and Procedures (TTPs)

◉ ‘Data or Asset’ based use cases relate to specific datasets or assets that represent additional risk to the business

Reviewing new use cases in each of these categories with a balance between importance and feasibility provides a great strategy for new use case prioritization.

Thursday 11 June 2020

Why 5G is Changing our Approach to Security

Cisco Prep, Cisco Exam Prep, Cisco Guides, Cisco Tutorial and Material, Cisco Security

While earlier generations of cellular technology (such as 4G LTE) focused on ensuring connectivity, 5G takes connectivity to the next level by delivering connected experiences from the cloud to clients. 5G networks are virtualized and software-driven, and they exploit cloud technologies. New use cases will unlock countless applications, enable more robust automation, and increase workforce mobility. Incorporating 5G technology into these environments requires deeper integration between enterprise networks and 5G network components of the service provider. This exposes enterprise owners (including operators of critical information infrastructure) and 5G service providers to risks that were not present in 4G. An attack that successfully disrupts the network or steals confidential data will have a much more profound impact than in previous generations.

5G technology will introduce advances throughout network architecture such as decomposition of RAN, utilizing API, container-based 5G cloud-native functions, network slicing to name a few. These technological advancements while allowing new capabilities, also expand the threat surface, opening the door to adversaries trying to infiltrate the network. Apart from the expanded threat surface, 5G also presents the security team with an issue of a steep learning curve to identify and mitigate threats faster without impacting the latency or user experience.

What are Some of the Threats?


Virtualization and cloud-native architecture deployment for 5G is one of the key concerns for service providers. Although virtualization has been around for a while, a container-based deployment model consisting of 5G Cloud Native Functions (CNFs) is a fresh approach for service providers. Apart from the known vulnerabilities in the open-source components used to develop the 5G CNFs, most CNF threats are actually unknown, which is riskier. The deployment model of CNFs in the public and private cloud brings in another known, yet the widespread problem of inconsistent and improper access control permissions putting sensitive information at risk.

5G brings in network decomposition, disaggregation into software and hardware, and infrastructure convergence which underpins the emergence of edge computing network infrastructure or MEC (Multi-Access Edge Compute). 5G Edge computing use cases are driven by the need to optimize infrastructure through offloading, better radio, and more bandwidth to fixed and mobile subscribers. The need for low latency use cases such as Ultra-Reliable Low Latency Communication (URLLC) which is one of several different types of use cases supported by 5G NR, requires user plane distribution. Certain 5G specific applications and the user plane need to be deployed in the enterprise network for enterprise-level 5G services. The key threats in MEC deployments are fake/rogue MEC deployments, API-based attacks, insufficient segmentation, and improper access controls on MEC deployed in enterprise premises.

5G technology will also usher in new connected experiences for users with the help of massive IoT devices and partnerships with third-party companies to allow services and experiences to be delivered seamlessly. For example, in the auto industry, 5G combined with Machine Learning-driven algorithms will provide information on traffic, accidents and process peer to peer traffic between pedestrian traffic lights and vehicles in use cases such as Vehicle to Everything (V2X). Distributed Denial of Service (DDoS) in these use cases are a very critical part of the 5G threat surface.

What are Some of the Solutions to Mitigate Threats?


Critical infrastructure protection: Ensure your critical software, technologies, and network components such as Home Subscriber Server (HSS), Home Location Register (HLR), and User Defined Routing (UDR) are secured with the right controls.

Cisco Secure Development Lifecycle: Being cloud-native and completely software-driven, 5G uses open source technologies. Although this is critical for scalability and allowing cloud deployment integrations, vulnerabilities from multiple open-source applications could be exploited by attackers. To reduce the attack surface, service providers need to verify the 5G vendor-specific secure development process to ensure hardened software and hardware. We offer security built into our architectural components. Our trustworthy systems’ technology includes trust anchor, secure boot, entropy, immutable identity, image signing, common cryptography, secure storage, and run-time integrity.

Vendor Assessment (security): It’s critical to validate the vendor supply chain security, secure your organization’s development practices from end to end, and employ trustworthy products. You must also be vigilant when it comes to continuously monitor hardware, software, and operational integrity to detect and mitigate infrastructure and service tampering. Sophisticated actors are looking to silently gain access and compromise specific behavior in the network. These attackers seek to take control of network assets to affect traffic flows or to enable surveillance by rerouting or mirroring traffic to remote receivers. Once they have control, they might launch “man-in-the-middle” attacks to compromise critical services like Domain Name System (DNS) and Transport Layer Security (TLS) certificate issuance.

Secure MEC & Backhaul: 5G edge deployments will supply virtualized, on-demand resource, an infrastructure that connects servers to mobile devices, to the internet, to the other edge resources and operational control system for management & orchestration. These deployments should have the right security mechanisms in the backhaul to prevent rogue deployments and right security controls to prevent malicious code deployments and unauthorized access. As these MEC deployments will include the dynamic virtualized environments, securing these workloads will be critical. Cisco workload protection, will help service providers to secure the workloads. Cisco’s Converged 5G xHaul Transport will provide the service providers with the right level of features for secure 5G transport.

Cisco Ultra Cloud Core allows the user plane to support a full complement of inline services. These include Application Detection and Control (ADC), Network Address Translation (NAT), Enhanced Charging Service (ECS), and firewalls. Securing the MEC would require multiple layers of security controls based on the use case and the deployment mode. Some of the key security controls are:

• Cisco Security Gateway provides security gateway features along with inspections on GTP, SCTP, Diameter, and M3UA.

• Secure MEC applications: Securing virtualized deployments on the MEC and centralized 5GC requires a smarter security control rather than just having firewalls, be it hardware or virtualized. Cisco Tetration provides multi-layered cloud workload protection using advanced security analytics and speedy detections.

• Secure MEC access: Securing user access to MEC can be catered by utilizing the Zero Trust methodology, which is explained in greater detail below.

Utilizing zero trust security controls during 5G deployment is critical for service providers. This is particularly important in the deployment phase where there will be multiple employees, vendors, contractors, and sub-contractors deploying and configuring various components and devices within the network. The old method of just providing a VPN as a security control is insufficient, as the device used by the configuration engineer might have an existing malicious code that might be deployed within the 5G infrastructure. This whitepaper gives you more insights on how zero trust security could be applied to 5G deployments.

End to End Visibility: 5G brings in distributed deployments, dynamic workloads, and encrypted interfaces like never before. This requires end-to-end visibility to ensure proper security posture. Advanced threat detection and encryption methods can identify malware in encrypted traffic without requiring decryption. And because latency is very important in 5G, we can’t use traditional methods of distributed certificates, decrypting traffic, analyzing the data for threats, and then encapsulating it again, as this adds too much latency into the network. Cisco Stealthwatch is the only solution that detects threats across the private network, public cloud, and even in encrypted traffic, without the need for decryption.

Source: Cisco.com

Wednesday 10 June 2020

Cisco CCNP Security 350-701 Certification | Syllabus | Practice Test



Exam Name: Implementing and Operating Cisco Security Core Technologies

Exam Number: 350-701 SCOR

Exam Price: $400 USD

Duration: 120 minutes

Number of Questions: 90-110

Passing Score: Variable (750-850 / 1000 Approx.)

Recommended Training: Implementing and Operating Cisco Security Core Technologies (SCOR)

Exam Registration: PEARSON VUE

Sample Questions: Cisco 350-701 Sample Questions

Practice Exam: Cisco Certified Network Professional Security Practice Test

Related Articles:

Tuesday 9 June 2020

Stay Flexible and Prepared with Virtual Education by Webex

There is so much more to the world than the four walls of our classrooms. Distance learning is expanding the world for students, teachers, and administrators. More educational institutions of all types and sizes around the world are turning to Cisco Webex as their remote learning tool of choice. 

Virtual classroom doors never close, ensuring the continuity of our education systems. Whether your institution needs to serve summer school classes or wants to ensure a smooth and prepared entry into virtual education next school year, Cisco Customer Experience (CX) team is here for you.

Keep Your Students Safe and Secure


You want to make sure your remote learning platform is an enabler, not a vulnerability. Built by the pioneer in video conferencing and industry leaders in cyber security, Webex is structured on various security frameworks, including end-to-end encryption. Always-on security runs unobtrusively in the background to keep all Webex participants safe and sensitive data secure. Let us help make sure you are satisfying the most stringent remote and distance learning security requirements

Cisco CX QuickStart Implementation Services 


Education can’t wait. The CX team can facilitate the rapid and secure deployment of your remote and distance learning environment. So, you can go about your core business of providing education to students, even if it is via an alternate means.

Cisco Tutorial and Material, Cisco Guides, Cisco Certification, Cisco Exam Prep, Cisco Webex

Users can leverage free on-demand self-help resources to get started, including recorded training sessions, quick–start guides, tips and tricks, and IT can attend Business Continuity Ask the Expert (ATX) webinars, including a special “Enabling Virtual Education” session.

But you don’t have to go it alone. We recognize that educational IT teams are under a new level of pressure to serve their communities, students, faculty, and staff, and may not have the network infrastructure in place or be familiar with Webex solutions and tools. Our CX team offers multiple levels of QuickStart Implementation Services for smooth, simplified, and rapid deployment. 

We’ll help IT with these essential services:

1. Efficiently onboard students, faculty, and staff to a remote learning experience.  We will introduce you to the administrative portal for user provisioning, which allows you to efficiently control adding, updating, and deactivating users. Avoid the security vulnerabilities born of executing changes manually for recurring school enrollment and staffing fluctuations.

2. Seamless integration with your single sign-on (SSO) system to the Cisco cloud. This will allow users to easily authenticate with their institutional credentials (username and password), while reducing calls to your helpdesk.

3. Focused hands-on training for your staff and students. Cisco will show teachers how to successfully get started, so your team can focus on other critical IT issues.

Cisco Tutorial and Material, Cisco Guides, Cisco Certification, Cisco Exam Prep, Cisco Webex

For schools and learning institutions who want to use Webex for administrative collaboration, such as holding faculty meetings remotely, we can help with integrating your business systems such as: 

1. Hands-on help with integrating Microsoft Active Directory to the Cisco cloud. Meeting organizers can then easily look up staff from the school directory for scheduling administrative meetings.  

2. Expert assistance integrating your local calendar to the Cisco cloud. This will help your faculty and staff avoid calendar conflicts between virtual school meetings and regular activities. 

3. Leverage Cisco guidance and expertise to help with testing and overcome any technological challenges that emerge in the first two weeks after going into production, making the transition as seamless as possible.

Saturday 6 June 2020

Enterprise Network Availability: How to Calculate and Improve

Right now, I am sitting at home thinking about how the world is being held together by the Internet. So far, the Internet has stood up to our new reality amazingly well. This is despite redistributed traffic loads, and an explosive growth in interactive, high-bandwidth applications. For a brief time at least, even casual users are recognizing and appreciating the network’s robustness and availability.

We shouldn’t bask in this success yet. Failures will happen. And some of these failures might result in application impacts which could have been avoided. If you are an Enterprise Operator, now is the perfect time to examine your design assumptions against the new reality of your network. What weaknesses have become exposed based on the shift to Telework? What needs upgrading considering the shift in application mix and resulting performance requirements? 

One way or another, your end users will adapt to what you are providing. And it is best if your network is flexible and robust enough to meet new expectations. Nobody wants end-users to acclimate themselves to a degraded Enterprise network experience.

Key to supporting today’s needs is understanding the flexibility–or lack thereof—of the end-to-end network architecture. For your architecture you need to understand:

◉ the behavior of deployed technologies/protocols,
◉ the strengths and weakness of embedded platforms, and
◉ how your topology can handle application demands while remaining resilient to failures.

Each of these impacts the resulting end-user experience, especially during failures.

But where do you start this architectural analysis? You need to first establish a quantitative basis that measures end-user application availability and performance under various failure scenarios.  It is possible to do this as there is a direct relationship between the probability of failure and the end user’s perception of network availability. Such a quantitative basis is essential as availability with acceptable performance is ultimately how a network is judged. 

Getting to Five Nines


The best-known metric of network availability is known as “five nines”. What five nines means is that the end-user perceives that their application is available 99.999% of the time. This permits only 5.26 minutes of downtime a year. Depending on the application and network topology, this can be a very stringent standard.

Consider Figure 1 below which shows serially connected routers, switches, access points, servers, and transited clouds.  When these ten elements are connected without any redundancy, each of these elements must be up and available 99.9999% (or six nines) of the time for the end-user to perceive five nines of availability.  As six nines allows only 32 seconds of downtime, having a single reboot a year could prove problematic.

Cisco Prep, Cisco Guides, Cisco Learning, Cisco Tutorial and Material, Cisco Exam Prep

Figure 1: Serial Transport Availability

The good news is that with the proper network, application, and services architecture, the individual devices making up the Internet do not need to support six nines of availability. All we need to do is add some redundancy. The following network design includes such a well-architected redundancy-based design. For this network design, if each element is fully independent, and if each element is available just 99.9% of the time, then the end-user will experience 99.999% availability.

Cisco Prep, Cisco Guides, Cisco Learning, Cisco Tutorial and Material, Cisco Exam Prep

Figure 2: Parallel Transport Availability

Despite the user’s experience being identical, the difference between the two figures above is huge. We have reduced the availability requirements of all component parts by three orders of magnitude. And we have made something highly reliable from less reliable parts. This really shouldn’t be surprising however. From its very beginnings, the Internet was designed to be available even when devices were lost to nuclear attacks.

In the decades since the Internet’s conception, Cisco has documented many technologies and approaches to achieving a very high degree of availability. A small subset of these includes quickly converging routing and switching protocols, device and link redundancy, and boot time reduction. But such technologies and approaches are only part of the availability equation. Network operators have the ultimate say in deploying these technologies to maximize network availability. Strategies include the distribution of application servers across geographically and organizationally diverse datacenters, as well as redundancy of access and core networks all the way to ensuring that fiber-optic cables from different service providers don’t run in the same fiber conduit. These strategies are proven to be effective at providing high availability.

The result of all this good network design and planning is that the majority of application availability failures don’t come from equipment failures. Instead they come from equipment misconfiguration. Protecting the consistency of the network configuration is non-trivial and becomes more difficult as you add new technologies to the network. In fact, protecting network consistency is a key reason network operators are choosing to deploy controllers to manage device configuration based on higher level expressions of intent. One of the main goals of network controllers is to automatically ensure correct and consistent configuration of all of the equipment in the network.

Intent, while very useful in this role, might not address every dimension of application availability. Consider the picture below of an Enterprise network integrated with a Public-Cloud topology.

Cisco Prep, Cisco Guides, Cisco Learning, Cisco Tutorial and Material, Cisco Exam Prep

Figure 3: Public Cloud Apps need Enterprise Authentication

In this network design, the Public cloud-based applications accessed solely through cellular data do not just depend on the cloud. They still depend on the accessibility of an Enterprise’s RADIUS Authentication infrastructure. In other words, at best a cloud-based application will only be as available as access to your Enterprise Data Center. This is a nuance which very few end-users will be able to recognize or troubleshoot as a cause of availability issues.

New Technologies Add Risks to Availability


It is not just the Enterprise’s Authentication infrastructure which we need to consider when thinking about the future of availability. There is a set of forces which are changing network design. Geoffrey Moore has done much work describing the continuous technology invention and deployment cycle. Based on this, it is best to think of the network as a continually changing entity.

Figure 4 below shows a subset of the forces and technologies which are top-of-mind Enterprise network design. Each of these have the opportunity to improve or degrade application availability if they are not taken into consideration during the network design.

Cisco Prep, Cisco Guides, Cisco Learning, Cisco Tutorial and Material, Cisco Exam Prep

Figure 4: Emerging Technologies Use Controllers

With the advent of Software-Defined Networking (SDN), the emergence and growth of new types of controllers is a trend which broadly impacts network availability calculations. Above in Figure 4, you can see a number of starred* technologies. Each star represents a new controller involved in the establishment and maintenance of an application flow. And the result of each star is the addition of a transactional subsystem which impacts the calculation of network availability.

What are examples of these transactional subsystems? Historically we have depended on transactional subsystems as DNS, BGP, DHCP, and Wireless LAN Controllers. As networks evolve, we are seeing the proposal or introduction of new transactional subsystems such as OpenFlow servers. We are also seeing the evolution of existing transactional subsystems such as RADIUS/Identity. The RADIUS/Identity evolution is quite important here. The evolution of user and workload identification is becoming more complex as cloud systems are integrated into the Enterprise. It is worth considering the impacts to application availability as corporate access control gets more deeply integrated into the cloud via technologies like Azure AD, Google IAP, SPIFFE, and ADFS.

Calculating the Availability of a Component Subsystem


The emerging technologies listed above are changing established network availability profiles. As a result, now is a good time for you to revisit any previous calculations. And if you do not have previous calculations then this may be an excellent time to calculate your availability and determine if it is appropriate.  

If you are looking to get started, an excellent primer is the Cisco Press book “High Availability Network Fundamentals“. Although it is from 2001, it is still excellent.  Within the book the free introduction chapter discusses two base concepts upon which system level availability calculations are ultimately constructed. The first concept is Mean Time Between Failures (MTBF).  MTBF is equal to the total time a component is in service divided by the number of failures. The second concept is Mean Time To Repair (MTTR). MTTR is equal to the total down time divided by the number of failures. You can also think about MTTR is the mean total time to detect a problem, diagnosis the problem, and resolve the problem. Using these two concepts, it becomes possible to calculate expected component availability via the equation:

Cisco Prep, Cisco Guides, Cisco Learning, Cisco Tutorial and Material, Cisco Exam Prep

In this equation, “A” stands for availability, which is expressed as a probability of 0% to 100%.  Key in the equation are the words “component subsystem”.  A component subsystem can be one device. A component subsystem can also be a network of devices. A component subsystem can even be infrastructure software running on a cloud of virtual hardware. What is critical for the equation is that the failure modes of this component subsystem are understood and can be quantified.

While the equation itself is simple, quantifying MTBF and MTTR for any component subsystem does take some effort. To begin with you should acquire MTBF estimates for equipment provided by your vendor. You may then choose to adjust these vendor MTBF estimates by considering factors as diverse the age of the equipment and even your local weather.  But equipment MTBF is only part of the picture. MTBF for transmission links should also be considered. When estimating these numbers, you need consider questions such as “how often do you see cable cuts or other failures in your environment” and “how well secured are your networking patch panels?”

Beyond MTBF is MTTR of your component subsystem. Getting a history of your MTTR is easy — as all you need to do is divide the total outage time by the total number of repairs during a given reporting interval. But your historical MTTR might not be an accurate predictor of your future MTTR. The longest (and most painful) outages are infrequent. The best way to predict future MTTR is to estimate the average time it takes to make a repair across the universe of all conceivable repairs. This helps you start quantifying infrequent issues. Especially if you are a small Enterprise, you really want to understand the hours or days it might take to diagnosis a new issue type and then a get a spare part installed or a cable fixed by a qualified local support.

If you are interested in quantified examples of MTBF and MTTR, again I recommend “High Availability Network Fundamentals“. This book explores specifics at a useful level of depth.

Looking back at the component subsystem availability equation, it is important to remember that the perception of what a failure is at the overall system level is unlikely to be the same as the definition of a failure in a component subsystem. For example in Figure 2, a failure of any single router component should be invisible at the overall system layer. I.e., MTBF is zero at the system level as there is no user perceived system failure.

However, if there are concurrent failures in redundant subsystems, there will be outages at the system level. We need to account for this in our availability calculations.

Luckily most network failures are independent events. And where networks do have cascading outages, this is often the result of underestimating the traffic needing support during failure events. As a result, simulating traffic during peak usage periods while a network is under load should result in the provisioning of adequate link capacity.  And assuming link capacities are properly dimensioned, traditional system level availability equations, such as we describe in this article, can then be applied.

As a network designer, it is important to remember where there are failure domains which can span subsystems. For example, if a clustered database is shared between two nodes, then a failure here will potentially impact what you considered your redundant subsystem. When this is a possibility, it is necessary to dimension this failure type at the system level, being careful not to also double-count that outage type at the component subsystem level.

Once you have a handle on your subsystems, you can start assembling larger availability estimates using the three probability equations listed below:

Cisco Prep, Cisco Guides, Cisco Learning, Cisco Tutorial and Material, Cisco Exam Prep

Serial Transport Availability

The first of these probability equations is used to calculate availability when several transport systems exist in serial. Here each transport subsystem encompasses its own failure domain, with its own availability estimate. The availability of a serial transport subsystems is the product of all the subsystems, as the component subsystem failure domains are serialized. That is, if any subsystem in the chain fails the whole system fails. Below is an example of how such a network availability calculation might be made for a simple Enterprise topology where the user application is connected via WiFi to a server located in an Enterprise data center.

Cisco Prep, Cisco Guides, Cisco Learning, Cisco Tutorial and Material, Cisco Exam Prep

Parallel Transport Systems Availability

The second of these equations is where transport systems exist in parallel. In other words, one transport subsystem backs up another. These are, unsurprisingly, known as parallel transport subsystems. The availability of a parallel transport subsystem is 1 minus the chance the multiple subsystems are out at the same time. A good example of such a design would be your home Wi-Fi which is backed up by your service provider wireless data service.

In practice, parallel transport subsystems will eventually connect to some serial subsystem. This is because application servers will typically exist within a single administrative domain. A more complex example of parallel subsystems in practice is shown in the figure below. Here an SD-WAN service is used to back up an Enterprise core network, but the application servers exist in a single datacenter.

Cisco Prep, Cisco Guides, Cisco Learning, Cisco Tutorial and Material, Cisco Exam Prep

Business Critical Transactional System Availability

The third equation calculates business critical transactional availability. This calculation is much like that of the serial transport calculation in that the product of all subsystems is included. However, as a transactional subsystem might only be required at or before flow initiation, it is sometimes useful to separate out this calculation, as shown in the figure below. Here the application user is accessing the network via campus WiFi, the application is itself sitting in public cloud, and the Application Authentication Server (such as a RADIUS single sign-on server) is in the Enterprise datacenter.

Cisco Prep, Cisco Guides, Cisco Learning, Cisco Tutorial and Material, Cisco Exam Prep

Such a calculation shows that the availability of cloud service is dependent on the availability of the enterprise Application Authentication Server. It is interesting to note that perhaps only once a day a user might need to acquire authentication credentials needed to access a cloud service during the remainder of the day. Such caching of transactional information itself can improve availability and scale.

As you use these equations, remember that your results can be no better than the underlying assumptions. For example, each equation is most easily applied where there is a strict hierarchical topology consisting of uniformly deployed equipment types. Topologies with rings and irregular layering either need far more complex equations, or you need to make simplifying assumptions, such as users having slightly different experiences based on where they sit within your topology.

Results of Modeling


After you have constructed these system and component level equations, measure them! It is this measurement data which will enable you to prove or disprove the MTBF and MTTR assumptions which you have made. They might even enable you to make changes before a more serious outage adversely impacts your business.

When you have modeled and measured for a while, you will see that a well-designed, redundant network architecture plays a paramount role in achieving excellent and predictable availability. Additionally, you will internalize how good design results in networks which are capable of five nines to be constructed out of subsystems which individually are not nearly as available.

The results of such calculation efforts might even provide you the business justification needed to make fundamental changes in your network architecture allowing you to achieve five nines. This should not be surprising. This result has been borne out by decades of network operator experience across a variety of deployment environments.

What are your experiences?


As mentioned above, these methods of calculating availability are not new. However they can seem heavyweight, especially to network operators not used to such quantification. As a result, network operators sometimes make simplifying assumptions. For example, some Enterprise operators will assume that their Internet backbone providers are 100% available. Such assumptions can provide reasonable simplification as the backbone might not be part of that individual’s personal operational metrics.  

So how do you measure the availability of your operational environment? It would be great to hear from you below on any rules of thumb you use, as well as any simplifying assumptions you make!