Showing posts with label Analytics & Automation. Show all posts
Showing posts with label Analytics & Automation. Show all posts

Friday, 15 January 2021

Cisco’s Data Cloud Transformation: Moving from Hadoop On-Premises Architecture to Snowflake and GCP

Cisco Exam Prep, Cisco Tutorial and Material, Cisco Learning, Cisco Guides, Cisco Prep, Cisco Career

The world is seeing an explosion of data growth. There are countless data-generating devices, digitized video and audio content, and embedded devices such as RFID tags and smart vehicles that have become our new global norm. Cisco is experiencing this dramatic shift as more data sources are being ingested into our enterprise platforms and business models are evolving to harness the power of data, driving Cisco’s growth across Marketing, Customer Experience, Supply Chain, Customer Partner Services and more.

Enterprise Data Growth Impact on Cisco

Enterprise data at Cisco has also grown over the years—with the size of legacy on-premises platforms having grown 5x over the past five years alone. The appetite and demand for data-driven insights has also grown exponentially as Cisco realized the potential of driving growth and business outcomes with insights from data, revealing new business levers and opportunities.

Cloud Data Transformation Drivers 

When Cisco started its migration journey several years ago, its data warehouse footprint was entirely on-Prem. With the business pivoting towards an accelerated data-to-insights cycle and the demand for analytics exploding, it quickly became apparent that some of the existing technologies would not allow us to scale to meet data demands.

Cisco Exam Prep, Cisco Tutorial and Material, Cisco Learning, Cisco Guides, Cisco Prep, Cisco Career

Why Snowflake and GCP?


Key technology leaders and architects within Data & Analytics conducted market assessments of various data warehousing technologies and reviewed Gartner assessments to shortlist products. We then performed comparative capability assessments, performance benchmarked POCs with representative workloads on Hadoop. Ongoing operational costs are a critical success factor of any solution, which is why cost assessment between the performance and ease of use played a key decision factor.

After significant evaluation, Snowflake and Google Cloud Platform were the chosen Cloud Platforms; Snowflake for our enterprise data and GCP for unstructured data processing and analytics.

Our early POCs indicated that Snowflake was 2-4 times faster than Hadoop for complex workloads. The fact that this was ANSI SQL-based yielded several advantages, including a larger qualified talent pool, shorter development cycles, and improved time to capability. The platform also offered a higher concurrency and lower latency compared to Hadoop. Snowflake was a clear winner!

GCP, by virtue of the rich set of tools it provides for analytics, was the chosen solution across multiple organizations in the enterprise and was a natural choice for analytics with the data residing in Snowflake.

Journey and key success factors


To migrate to Snowflake and GCP, we had to mobilize the enterprise to migrate out of Hadoop within a six-quarter timeline. From a central program management perspective, monumental effort went into planning, stakeholder engagement, vendor selection, and training and enablement of the entire enterprise.

As of December 2020, 100% of the Hadoop workload has been migrated to Snowflake, with key stakeholders like Marketing, Supply Chain, and CX fully migrated and leveraging the benefits of the Cloud Platform.

Some of the key enablers for our successful migration within such a short timeframe include:

1. Security certification: The first question from all of our enterprise stakeholders was on the security aspects of storing our data in the Cloud. Extensive work was done with InfoSec and the cryptography team on enabling security with IP whitelisting and Cisco’s private key encryption with Snowflake’s tri-secret secure feature. A lot of attention also went into the D&A Data foundation architecture to enable Role-Based Access Control (RBAC) and granular role separation to manage applications safely and securely.

2. Innovation with foundational capabilities: Right from the start, we knew that in order to accelerate migration for the enterprise, the foundation of ingesting data from on-prem sources to the cloud, of maintaining data quality in the cloud data warehouse, of automated on-boarding of new users and applications were critical. The innovative enabler we are especially proud of is the custom ingestion framework that ingests data from our on-prem sources to Snowflake at the speed of ~240MBPS, with an average of 12TB of incremental data ingested each day into Snowflake.

3. Automation, automation, automation: This was our mantra. With a talented team, APIs were developed for aspects of enforcing security like token/DB credentials rotation and automating common administration and data access flows. We also built client-facing tools so application teams could own and meter their performance and costs: cop jobs, self-service warehouse resizing are two such examples.

4. Proactive cost management: One key paradigm shift in the Cloud is the fact that platform costs are no longer someone else’s problem, or something you worry about only every few years when planning for capacity. With the ability to track usage and costs at a granular level by application comes the responsibility to manage costs better. Visibility of these usage patterns is key to enabling actionable insights for each application team. Data & Analytics has enabled several dashboards that display costs, usage trends over time, a prediction of costs based on current trends, and more. Alerts are also sent based on customizable criteria, such as a week on week spike.

5. Enterprise enablement: With the monumental task of having to migrate nearly 300 applications, developed over five years in Hadoop, to Snowflake in 6 quarters, it was critical to ensure that the technology barrier was reduced right away. Over 25 training sessions were conducted with over 3000 participants trained over the course of FY20. This, coupled with numerous working sessions with Snowflake and Data & Analytics architects to share best practices and learnings across the teams, enabled a successful migration for all our stakeholders.

6. Enterprise alignment: Lastly (but definitely not the least), ensuring we have stakeholder buy-in early in the game was critical to the success of a transformation at this scale. We worked at the grassroots level with the execution team, the leadership team, and executives to secure commitment and support towards this enterprise wide program.

Results observed and testimonials


As a data warehousing platform, Snowflake has significantly surpassed the performance across multiple dimensions, both in reporting and transformations.  Transformation jobs that would take 10 or more hours to run are now completing within an hour, a 10x performance improvement. This provides our business teams more current data on their dashboards, allowing for more accurate insights based on the latest data. Reports are now on an average 4 times faster, with a 4x concurrency improvement, which gives our analysts the flexibility to run reports in parallel based on business needs.

The simple SQL-based technology has reduced the overall time to develop new capabilities or enhance existing ones. Our enterprise stakeholders report about 30% productivity improvement allowing faster time to capability, a key goal with this journey.

Some Testimonials:

◉ “The Cloud will help us deliver insights to drive business growth, agility needed for faster and for more informed decision making, and improve productivity” — Digital Marketing

◉ “Customer Service agents can immediately pull case reports and support Cisco customers on average 20x faster than Hadoop” — Customer Experience

◉ “Virtual Demand Center users on Snowflake receive more accurate customer and partner data and receive leads that are more likely to buy.” — Sales and Marketing

The Cloud Data Platform’s rapidly evolving features also bring additional avenues to improve data governance, enforce more granular data security and harness the power of data – both public and Cisco data, more effectively partner with our customers and partners, and deliver data-driven outcomes.

Source: cisco.com

Tuesday, 19 May 2020

Cisco Threat Response takes the leap with SecureX

Cisco Prep, Cisco Guides, Cisco Tutorial and Material, Cisco Learning, Cisco SecureX

Reimagine the grocery delivery experience


Even in typical times, grocery and household shopping is time consuming. Especially, if you need to visit multiple stores – a main supermarket for your basics, a specialty store to accommodate diet restrictions, and another for bulk items. In a fast-paced world – with time spent working, family caregiving, and other responsibilities – grocery shopping is a tedious but necessary chore…or is it? The evolution of acquiring groceries and household goods has been one to watch as grocery delivery services, such as Instacart and Shipt, is increasingly relevant. These companies have each built a platform with a network of grocery providers to solve the problem – a simple and efficient way for customers to purchase groceries without having to leave their homes.

Now let’s take grocery shopping to the next level. What if you didn’t even need to proactively browse items and put them in your Instacart grocery order. Imagine if your “smart” refrigerator had sensors to detect inventory levels, and connected to Instacart, your recipes, and meal planning apps. Groceries could be ordered automatically or on-demand based on the menu you’ve planned and what you actually need. One platform with all of your apps integrated and automated to simplify not only your grocery shopping experience but your entire cooking experience. This and many other platform experiences have been developing over the last several years to bring two (or more) sides of a connection together with more efficiency and use cases.

What does grocery shopping have in common with cybersecurity?


The cybersecurity industry is ripe for this type of innovation. We all know that the industry has historically been quite fragmented – at last count, an estimated 3000+ vendors are in this space and customers use, on average, 75 security tools. What does that mean for your security teams? Multiple tools share limited context between them with incomplete, labor-intensive workflows. Going back to the grocery experience, this is akin to visiting seven different stores in one day to tackle a shopping list for each store, and hoping you don’t miss an item. Also consider high lifecycle costs associated with maintaining interoperability, which is often limited. When you need to take into account an ever-evolving threat landscape and attack surface, this trend is not sustainable.

A platform journey two years in the making


Nearly two years ago, Cisco Threat Response debuted to combat this problem for Security Operations teams. As a valuable add-on application to several Cisco Security products — at no additional cost – Threat Response accelerated investigations and remediation by aggregating and correlating intelligence and data across your security products, both Cisco and third party. Threat Response has helped nearly 9,000 customers simplify their security operations. As Don Bryant, CISO for The University of North Carolina at Pembroke, says, “Having a holistic security platform has helped us simplify and accelerate our security operations. All of our tools seamlessly integrated through Threat Response gives us one view into our layered protection and valuable time back.”

Cisco Prep, Cisco Guides, Cisco Tutorial and Material, Cisco Learning, Cisco SecureX

Figure 1: Cisco Threat Response application for threat investigation and remediation

As background, Threat Response provides a visual, real-time answer for if, and how, threats have impacted your environment, so, you can take first-strike response actions in the same interface. Security operations teams use Threat Response to:

◉ Aggregate global threat intelligence: Search, consume, and operationalize threat intelligence, both public and private sources, with one application.

◉ Accelerate threat hunting and investigations: Visualize threats and incidents across multiple technologies in one view, then take response actions without leaving the console.

◉ Simplify incident management: Coordinate security incident handling across technologies and teams by centralizing and correlating alerts and triaging those that are high priority.

Now we’re continuing our mission of simplifying security and building on Threat Response core capabilities with SecureX, a built-in platform experience included with Cisco Security products. SecureX will make life even easier for Security Operations, and will also benefit Network Operations and IT Operations. Let’s talk about this evolution.

Is SecureX just a cool new name for Threat Response?


Since we announced SecureX at RSA Conference in February, you might be wondering, what’s the difference between Threat Response and SecureX? Are they one and the same – and SecureX is just a sleek rebranding?

The short answer is no. If Threat Response is like the Instacart of today, SecureX is the reimagined seamless grocery shopping experience we’ve envisioned above. Whether it’s the grocery or cybersecurity industry, the goal is always simplification. SecureX builds upon Threat Response’s core concepts of integrating your security products – both Cisco and third-party tools – to simplify security operations. Leveraging the success of Threat Response with Security Operations teams, SecureX takes this foundation to the next level to drive collaboration between SecOps, NetOps, and ITOps. SecureX simplifies security through:

1. Unifying visibility across your entire security environment.

2. Enabling automation in workflows to maximize your operational efficiency by eliminating repetitive tasks and human error.

3. Adding more out-of-box interoperability to unlock new potential from your Cisco Security investments and cascade them across your existing security infrastructure.

Cisco Prep, Cisco Guides, Cisco Tutorial and Material, Cisco Learning, Cisco SecureX

Figure 2: SecureX connects your entire security infrastructure

Enhanced Threat Response capabilities, now part of SecureX


Now as a key component of SecureX, Threat Response is enhanced to unlock even more value from your investments. Here’s how:

◉ You already know that Threat Response aggregates and correlates security context from multiple technologies into a single view, but now as SecureX threat response, users will have a customizable dashboard with ROI metrics and operational measures. And when you leave the dashboard, SecureX follows you to maintain contextual awareness and improve collaboration wherever you are in your Cisco Security infrastructure.

◉ Users will now be able to cut down investigation time even further by automating threat hunting and investigation workflows. With the orchestration feature in SecureX, users can set up event-based triggers to periodically hunt for indicators of compromise, create or add to a casebook, and post a summary in a chat room for collaboration.

◉ Threat Response had been rapidly growing its partner ecosystem, and SecureX not only expands the ecosystem instantly upon commercial availability but extends past it to include your core infrastructure. Together, our out-of-box interoperability with built-in and pre-packaged integrations from Cisco or select technology partners reduces the time spent integrating multiple technologies, or worse, working across multiple consoles. We’ll continue to support custom integrations via APIs, so any of the features of SecureX will work with your existing investments.

Similar to the reimagined grocery experience, SecureX brings greater efficiency and simplification in the midst of major market forces. The enhanced visibility, automation, and integrated platform capabilities with SecureX threat response further reduces mean dwell time by accelerating investigations and MTTR for SecOps. Without having to swivel between multiple consoles or do the heavy lifting integrating disjointed technologies, you can speed time to value and reduce TCO. SecureX will enable better collaboration across SecOps, NetOps, and ITOps – and ultimately simplify your threat response.

Saturday, 25 April 2020

Cisco Helps Competitive Carriers Deliver 5G Service Agility

Cisco Prep, Cisco Tutorial and Material, Cisco Certifications, Cisco 5G

5G promises revolutionary new consumer experiences and lucrative new business-to-business (B2B) services that were never possible before: wireless SD-WANs, private 5G networks, new edge computing cases, and many others. Actually delivering these groundbreaking services, however, will require much more than just new 5G radio technology at cell sites. It will take very different capabilities, and a different kind of network, then most service providers have in place today.

Ultimately, you need a “service-centric” network—one that provides the flexibility and control to build differentiated services, rapidly deliver them to customers, and manage them end-to-end—across both wireless and wireline domains. What does a service-centric network look like? And what’s the best way to get there from where you are today? Let’s take a closer look.

Building a Service-Centric Network


Viewing the media coverage around 5G, you might think the revolution begins and ends with updating the radio access network (RAN). But that’s just one piece of the puzzle. Next-generation services will take advantage of the improved bandwidth and density of 5G technology, but it’s not new radios, or even 5G packet cores, that make them possible. Rather, they’re enabled by the ability to create custom virtual networks tuned to the needs of the services running across them. That’s what a service-centric network is all about.

When you can tailor traffic handling end-to-end on a per-flow basis, you can deliver all manner of differentiated services over the same infrastructure. And, when you have the end-to-end automation that service-centric networks imply, you can do it much more efficiently. Those capabilities go much deeper than the radios at your cell sites. Sure, adding 5G radios will improve last-mile speeds for your customers. But if you’re not evolving your end-to-end infrastructure towards service-centric principles, you won’t be able to deliver net-new services—or tap new B2B revenue streams.

Today, Cisco is helping operators of all sizes navigate this journey. We’re providing essential 5G technologies to help service providers like T-Mobile transform their networks and services. (In fact, Cisco is providing the foundational technology for T-Mobile’s non-standalone and standalone 5G architecture strategy.) At the same time, we’re building on our legacy as the leader in IP networking to unlock new transport, traffic handling, and automation capabilities. At the highest level, this evolution entails:

1. Implementing next-generation IP-based traffic handling

2. Extending IP all the way to endpoints

3. Laying the foundation for end-to-end automation

Optimizing Traffic Management


As the first step in building a service-centric network, you should be looking to further the migration of all network connections to IP and, eventually, IPv6. This is critical because IP networks, combined with technologies such as MPLS, enable multi-service networks with differentiated traffic policies. Without advanced traffic management, you can’t provision, monitor, and assure next-generation services under service-level agreements (SLAs), which means you can’t tap into lucrative consumer and business service revenue opportunities.

Today, most operators manage traffic via MPLS. Although MPLS has been highly effective at enabling traffic differentiation, it has complexity issues that can impede the scale and automation of tomorrow’s networks. Fortunately, there’s another option: segment routing. Segment routing offers a much simpler way to control traffic handling and policy on IP networks. And, by allowing you to programmatically define the paths individual services take through the network, it enables much more efficient transport.

Many operators have deployed segment routing and are evolving their networks today. You can start now even in “brownfield” environments. Cisco is helping operators implement SR-MPLS in a way that coexists with current architectures, and even interoperates with standards-based legacy solutions from other vendors. Once that foundation is in place, it becomes much easier to migrate to full IPv6-based segment routing (SRv6) in the future.

Extending IP


As you are implementing segment routing, you should go one step further and extend these new service differentiation capabilities as close to the customer as possible. This is a natural progression of what operators have been doing for years: shifting almost all traffic to IP to deliver it more effectively.

Using segment routing in your backhaul rather than Layer-2 forwarding allows you to use uniform traffic management everywhere. Otherwise, you would have to do a policy translation every time a service touches the network. Now, everything uses segment routing end to end, instead of requiring different management approaches for different domains. You can uniformly differentiate traffic based on needs, applications, even security, and directly implement customer SLAs into network policy. All of a sudden, the effort required to manage services and integrate the RAN with the MPLS core is much simpler.

The other big benefit of moving away from Layer-2 forwarding: a huge RAN capacity boost. Layer-2 architectures must be loop-free, which means half the paths coming off a radio node—half your potential capacity—are always blocked. With segment routing, you can use all paths and immediately double your RAN bandwidth.

Building Automation


As you progress in building out your service-centric network, you’re going to be delivering many more services. And you’ll need to manage more diverse traffic flows with improved scale, speed, and efficiency. You can’t do that if you’re still relying on slow, error-prone manual processes to manage and assure services. You’ll need to automate.

Cisco is helping service providers of all sizes lay the foundation for end-to-end automation in existing multivendor networks. That doesn’t have to mean a massive technology overhaul either, with a massive price tag to go with it. You can take pragmatic steps towards automation that deliver immediate benefits while laying the groundwork for much simpler, faster, more cost-effective models in the future.

Get the Value You Expect from 5G Investments


The story around 5G isn’t fiction. This really is a profound industry change. It really will transform the services and revenue models you can bring to the market. But some things are just as true as they always were: You don’t generate revenues from new radio capabilities, you generate them from the services you can deliver across IP transport.

What’s new is your ability to use next-generation traffic handling to create services that are truly differentiated. That’s what the world’s largest service providers are building right now, and it’s where the rest of the industry needs to go if they want to compete and thrive.

Let Cisco help you build a service-centric network to capitalize on the 5G revolution and radically improve the efficiency, scalability, and total cost of ownership of your network.

Thursday, 16 April 2020

Time Series Analysis with ARIMA: Part 3

XI. Cisco Use Case – Forecasting Memory Allocation on Devices


Background: Cisco devices have a resource limit (rlimit), based on their OS and platform type. When a device reaches its resource limit, it might indicate harm in the network or a leak. High water mark is used as a buffer to the resource limit. It is a threshold, that if it reaches the resource limit, indicates a problem in the network. For more information on high water mark, here is a useful presentation on what it means for routers and switches.

Problem: The client wants to track and forecast high water mark over a two-year period. We need to determine if/when high water mark will cross the resource limit two years into the future.

Solution: To solve this problem, we used time series forecasting. We forecasted high water mark over a two-year period.

A lot of effort went into thinking about the problem. I will put the general steps we took in solving this before going into the details. Notice that most of these steps were mentioned or highlighted in the first two blogs!

1. First, we looked at and plotted data
2. Second, we formulated hypotheses about the data and tested the hypotheses
3. Third, we made data stationary if needed
4. Fourth, we trained many separate model types and tested them against future values for validation
5. Finally, we gave conclusions and recommendations moving forward

1) Looked at and Plotted Data

When we initially received the dataset, we received numerous variables in addition to high water memory utilization over time. We received 65 devices and for each device, we were given about 1.5 years of data in monthly snapshots. Below is information on one example Cisco device. Note the variables labeled X and Y (timestamp and “bgp_high_water”). These are the variables we are interested in using for our forecast. The “bgp_high_water” variable is represented in bytes.

Cisco Prep, Cisco Tutorial and Material, Cisco Learning, Cisco Certification

We then plotted data of all devices. Below is an example device highlighting the importance of visualization. Depending on how you look at the data can make immense difference in the interpretation of it. The graph on the right is plotted with respect to the resource limit, which makes the variability in bytes look much less extreme than it appears on the left. These graphs inform us the steepness of the rise of high water mark over time and became critical when interpreting forecasts and predictions with different models.

Cisco Prep, Cisco Tutorial and Material, Cisco Learning, Cisco Certification

Cisco Prep, Cisco Tutorial and Material, Cisco Learning, Cisco Certification

2) Formulated hypotheses about the data and test the hypotheses

We faced two big challenges when we looked at these data points on a graph.

Challenge #1: The first challenge we faced was related to the number of data points per device. We only had monthly snapshots of memory allocation over ~1.5 years. This means we had about 15 data points on average per device. For a successful time series analysis, you need least 2 to 3 years’ worth of data to predict 2 years out, which would mean we would need at the very least 24-36 data points per device. Because we didn’t have much data, we could not confidently or reliably trust seasonality and trend detection tests we ran. This issue can be resolved with more data, but at the time we were given the problem, we could not trust the statistics completely.

Challenge #2: The second challenge we faced was related to how we could use extra variables as leading indicators for a multivariate model. The red continuous variables below were all of the variables we could consider as leading indicators for high water mark. We had a wealth of choices and decisions to make, but we had to be extremely cautious about what we could do with these extra potential regressors.

Cisco Prep, Cisco Tutorial and Material, Cisco Learning, Cisco Certification

Based on these sets challenges, we made 4 formal hypotheses about the data given.

Hypothesis 1: Given the current dataset, we cannot reliably detect trend or seasonality on a device.

We failed to reject the null hypothesis that there is no trend. Statistically, we found trending data in some devices – upward and downward. However, because of the number of data points, we felt there was not a strong enough signal to suggest there was a trend in the data. After removing trend, the data became statistically stationary. We detected no seasonality.

To make this conclusion, we plotted our data and ran the ADF and KPSS tests (mentioned in part 2 of the blog series) to inform our decisions. For example, let’s take a look at this particular device below. Visually, we see the data has some trend down, but it is not by many bytes. Additionally, we could see that for seasonality there is not much there. As mentioned before, we needed at least 2-3 solid years to detect seasonality or at least definitively say there is seasonality in the data. When we ran the ADF and KPSS tests, the results suggested that the data was non-stationary, but because there was so little data at the time, we believed it would not make a difference for our models if we made data stationary or non-stationary.

Cisco Prep, Cisco Tutorial and Material, Cisco Learning, Cisco Certification

Hypothesis 2: There is no significant correlated lag between the PATH variables and BGP High Water

We found 3 features that were highly correlated with high water mark. They were IPV4, IPV6, and Speaker AS Paths. Because they were highly correlated, we thought these features could be used as leading indicators to high water mark. However, on closer inspection, that was not the case. For a variable to be a leading indicator, we needed the variable of interest to be highly correlated with high water mark at a lag point. For example, say the point at IPV4 rises by 1000 bytes in month 10, and high water mark will rise by some amount in Month 12. For IPV4 to be a leading indicator of high water mark, you would need to see this 2 month rise consistently across time.

Notice the plots below on the right. For the IPV4 variable, we lagged the data points by each month and then looked at the correlation coefficients to high water mark. Notice that you see a high correlation between the IPV4 points and the high water mark at the first data point, also called lag 0. The correlation coefficient is around 0.75. However, this correlation won’t tell us anything other than that when IPV4 paths rise, high water mark rises at the exact time and vice versa. If you look at the data points after the first one, you will see the correlation with high water mark go down and there is a slight dip before going back to no correlation at all. Because of this, we ruled IPV4 out as a leading indicator.

If you look at the relation between IPV6 and high water mark, at around 12 months lag, you see a correlation rise to about 0.6. IPV6 looks like an interesting variable at first sight, but with a little domain knowledge, you would understand that this is also not possible. Does it really seem possible for the example device below, that when IPV6 paths increase, high water mark increases a year later? If we had 2 years of data, and we saw that rise happen again we might think there is something there, but we did not have enough data to make that call. Now think about all of this for forecasting. We could not predict out a month, let alone 2 years, if the indicator we were basing your model on rose at the same time as the target variable. We therefore did not consider any variables for multivariate forecasting, but we wanted to leave little doubt about it. This leads to hypothesis 3.

Cisco Prep, Cisco Tutorial and Material, Cisco Learning, Cisco Certification

Cisco Prep, Cisco Tutorial and Material, Cisco Learning, Cisco Certification

Hypothesis 3: High water mark is a function of the paths: IPV4, IPV6, and Speaker_AS, so cannot be used as explainers/regressors for High Water Mark

The high correlation of these variables also might be indicating that they are a direct function of high water mark, meaning that if we added these variables together times some constant, we would get the high water mark number in bytes. The histograms below highlight the variability in this constant for each device. If this variability is low, it is an indication that the variables of interest to the high water mark are probably direct functions of high water mark. We confirmed this with the following formula:

(IPV4+IPV6+Speaker)*X = high water mark or IPV4+IPV6+Speaker = High Water / X, where X is the constant of interest.

Notice the plots below for three example devices. We plotted X at each time point as a histogram. Notice there is very little variability in X. Each device’s constant is centered around a particular mean. We therefore concluded that with the leading indicators of interest, we could not use a multivariate model to predict high water mark. We would need a leading indicator that is not directly related to our target variable. A good example of a leading indicator in this scenario would be the count of how many IP Addresses were on the network at a single time.

Cisco Prep, Cisco Tutorial and Material, Cisco Learning, Cisco Certification

Hypothesis 4: There will be no significant difference in MAE and MAPE from a Baseline ARIMA Model ARIMA (1, 1, 0) or ARIMA(1, 0, 0)

We then decided that we would try some different model types on our train-test split to see which model had the best model performance. We made our metric for best performance the lowest mean absolute error and mean absolute percentage error on the test set of a train-test split. We then evaluated the models further by forecasting 2 years out and evaluated the MAE and MAPE up to current data point at the time we received the data (6 months). The next sections highlight the outcomes of hypothesis 4.

3) Made data stationary if needed

While we did not think we could reliably detect trend, we decided to run a model on differenced “stationary” datasets anyways. We also made models with the multivariate approach to see performance. We made a prior assumption that it would perform worse than a baseline ARIMA. Below is an example device plotted with its non-differenced and differenced data. All other device plots follow similar patterns.

Cisco Prep, Cisco Tutorial and Material, Cisco Learning, Cisco Certification

4) Trained many separate model types and tested them against future values for validation

Below are the results of our initial analysis given the data on each device. We tried many different approaches, including ARIMA with a baseline model, Facebook’s Prophet model, ARIMAX, and an exponential smoothing model called local linear trend. Given the constraints of the data at the time, our best approach was a baseline ARIMA model. Results showed that across all devices, models using a baseline ARIMA with parameters p=1, d=0, and q=0 had the lowest MAE and MAPE. Given more data and time to detect the systematic components, we would likely have seen better results with more complex and smoother models. However, given the data we had so far, a simple ARIMA model performed really well!

Cisco Prep, Cisco Tutorial and Material, Cisco Learning, Cisco Certification

Cisco Prep, Cisco Tutorial and Material, Cisco Learning, Cisco Certification

The baseline ARIMA was not without its flaws. Given that most of the data constituted random components, it was very difficult to predict out 2 years for as little data as we had. Thus, you will see forecasting patterns similar to the examples below. Training and testing yields relatively low and good mean absolute errors, whereas forecasts yield higher and worse mean absolute error over time.

Below is an example of a sample prediction on a test set for 2-3 months out using ARIMA(1,0,0). Notice that both the training and testing Mean Absolute Error and Mean Absolute Percentage error are very low.

Cisco Prep, Cisco Tutorial and Material, Cisco Learning, Cisco Certification

Next, is an example forecast done 6 months out without validation data using ARIMA(1,0,0). Notice the forecast is a straight line and visually you can see that the forecast error is much higher. It shows that a baseline ARIMA cannot adjust forecasts to the random change in the data. Because there was no reliable detected seasonality, cycle, or trend in the data per device, the model was trying to predict random components.

Cisco Prep, Cisco Tutorial and Material, Cisco Learning, Cisco Certification

5) Conclusions and Recommendations

As a result of our initial analyses, we chose a baseline ARIMA model over other more complicated models. In our conclusion, we recommended to the client that they probably shouldn’t be forecasting out more than a couple of months. Instead, they can do rolling predictions out every 2 months as new data comes in and then scale to more complex models once trends or seasonality are spotted in the data.

For some good news on the project, we have recently gotten more data, and we have been able to integrate more complex models that utilize trend data. In our latest dataset for this use case, we have been able to spot trends and been able to make 2 year predictions with much more confidence using a model approach called local linear trend. We used this in our initial model approach but did not have much confidence in it given the data constraints at the time. Fortunately, time series models in general are very flexible given that the data input is simply time and a target variable. Model adaptations are therefore very fast and easy to implement once you understand the basics of time series analysis and have enough data to spot trends and seasonality.

XII. Designing Your Own Time Series Forecasts


Let’s end the blog with some rules you can follow to develop your own time series analysis. This will give you the confidence to start thinking about time series in a new way.

1) Always plot and decompose your data

Always plot and decompose your data. Even if you don’t think there are any trends or seasonality, it’s always good to look at your data!

2) Construct hypotheses about your data after looking at it.

Always construct hypotheses about your data after you look at it. You want to do this so that you can formalize what you think is going on in your datasets and test your assumptions about them. For example, suppose you were asked to predict out 2 years with only monthly snapshots of data. Maybe you don’t think you have enough data to detect trend or seasonality. It is possible that you will find no systematic patterns of trend and seasonality in your dataset. If you do not, know you will most likely be predicting only the random components of your dataset. This will help you make strong, testable conclusions later when presenting to stakeholders.

3) For forecasting, make sure your data are stationary. For some smoothing models you may not need to do this.

Remember the definition and rules of stationarity when using models like ARIMA!

4) If multiple variables are present in your data, try to determine the usefulness of using them in your multivariate time series model.

Can other continuous variables be used as leading indicators or signals to help you predict the next data point? If not, don’t use them! They likely won’t help your forecasts, and might actually make them worse.

5) Choose models that make sense for the data you are given, and don’t be afraid to experiment with model parameters.

The autoregressive and the moving average terms may sometimes cancel each other out under certain conditions. Refer to the post below to help you understand your data better and pick the appropriate lag parameters.

6) Compare model parameters and model types. You will likely find that simpler models are usually better.

Always try using simple models first as a baseline, like ARIMA(1,0,0) or ARIMA(1, 1, 0). Then you can add more complexity if your data has complex systematic components like trend and seasonality. You will likely get better prediction accuracy and lower forecast error as you get more data. Additionally, if you don’t need multivariate analysis, don’t use multivariate analysis. Only use it if you think it will improve your forecasts.

Tuesday, 14 April 2020

Time Series Analysis with ARIMA: Part 2

This is a continuation of the Time Series Analysis posts. Here, I will do a deep dive into a time series model called ARIMA, an important smoothing technique used commonly throughout the data science field.

If you have not read part 1 of the series on the general overview of time series, feel free to do so!

VII. ARIMA: Autoregressive Integrated Moving Average


ARIMA stands for Autoregressive Integrated Moving Average. These models aim to describe the correlations in the data with each other. You can use these correlations to predict future values based on past observations and forecast errors. Below are ARIMA terms and definitions you must understand to use ARIMA!

1) Stationarity: One of the most important concepts in time series analysis is stationarity. Stationarity occurs when a shift in time doesn’t change the shape of the distribution of your data. This is in contrast to non-stationary data, where data points have means, variances and covariances that change over time. This means that the data have trends, cycles, random walks or combinations of the three. As a general rule in forecasting, non-stationary data are unpredictable and cannot be modeled.

To run ARIMA, your data needs to be stationary! Again, a time series has stationarity if a shift in time doesn’t cause a change in the shape of the distribution. Basic properties of the distribution like mean, variance, and covariance are constant over time. In layman’s terms, you need to induce stationarity in your data by “removing” systematic component to make the data appear random. This means you must transform your non-stationary dataset to use it with ARIMA. There are two different violations of stationarity, but this is outside the scope of this post. To understand them, please look at this post: understanding stationarity. There are 2 techniques to induce stationarity, and ARIMA fortunately has one way of inducing stationarity by using differencing, which is in the ARIMA equation itself. There are two different tests called the ADF and the KPSS test to check if your data is stationary or not. After running tests, induce stationarity by transforming your data appropriately until it is stationary.

2) Differencing: A transformation of the data that involves subtracting a point at time t with a value at time t-p, where p is a specified lag value. A differencing of one means subtracting the point at time t with the value at t-1 to make the data stationary. The graph below is applying a differencing order of 1 to make data stationary. All of this can be done in many coding libraries and packages.

Cisco Prep, Cisco Tutorial and Material, Cisco Learning, Cisco Certification, Cisco Cert Prep

3) Autoregressive Lag: These are the historical observations of a stationary time series

The Autoregressive part in ARIMA is tied to historical aspects of the data. 1 autoregressive lag is the previous data point. Two autoregressive lags refers to two previous data points and so on. This is a critical component to ARIMA, as this will tell you how many of the previous data points you would like to consider when making the next predicted data point. Useful techniques for determining how many autoregressive lags to use in your model are autocorrelation and partial autocorrelation plots.

As an example, see these autocorrelation plots and partial autocorrelation plots below. Because of the drop-off after the second point, this indicates you would use 2 autoregressive lags in your ARIMA model.

Cisco Prep, Cisco Tutorial and Material, Cisco Learning, Cisco Certification, Cisco Cert Prep

Cisco Prep, Cisco Tutorial and Material, Cisco Learning, Cisco Certification, Cisco Cert Prep

4) Moving Average Lags: This is related to historical forecast error windows.

The moving average lags refer to the size of the window you want to use for computing your prediction error when training your model. Two moving average lags means you are using the average error of the previous two data points to help correct the predictions on the data point you are predicting next! For the moving average lags, you specify how big your window size will be. These window sizes will contribute to how many data point errors you want to use for your next prediction. Again, it is useful to determine how many lags you use with autocorrelation and partial autocorrelation plots.

5) Lag Order: This tells us how many periods back we go.  For example, lag order 1 means we use the previous observation as part of the forecast equation.

VIII. Tuning ARIMA and general equation

Now that you know general definitions and terms, I will talk about how these definitions tie into the ARIMA equation itself. Below is the general makeup of an ARIMA model, along with the terms used for calibrating and tuning the model. Each parameter will change the calculations done in the model.

Below are the general parameters of ARIMA:

ARIMA(p, d, q) ~ Autoregressive Integrated Moving Average(AR, I, MA)
p – order of the autoregressive lags (AR Part)
d – order of differencing (Integration Part, I)
q – order of the moving average lags (MA Part)

Below is the general formula for ARIMA that shows how the parameters are used. I will break down each parameter and how they fit into the equation.

Cisco Prep, Cisco Tutorial and Material, Cisco Learning, Cisco Certification, Cisco Cert Prep

1) p – order of the autoregressive lags (AR Part)

When p=2 and everything else is 0 – ARIMA(2,0,0), you are using the 2 previous data points to contribute to your final prediction. This can be noted in the equation below and is a subset of the entire ARIMA equation.

Cisco Prep, Cisco Tutorial and Material, Cisco Learning, Cisco Certification, Cisco Cert Prep

The equation gives you a forecast at a particular time if you use p to the order of 2 autoregressive lags. It uses the previous 2 data points and the level at that point in time to make the prediction. For example, the red values below are used to forecast the next point, which would be the first data point on the green line.

Cisco Prep, Cisco Tutorial and Material, Cisco Learning, Cisco Certification, Cisco Cert Prep

2) d – order of differencing (Integration Part, I)

The next parameter in ARIMA is the d parameter, which also the differencing part or integration part. As mentioned earlier, you need to difference your data to make it stationary. When you have non-stationary data, ARIMA can help apply differencing until your data is stationary. The d term in the ARIMA model does this differencing for you. When you apply d=1, you are doing first order differencing. That just means you are differencing once. If you apply d=2, you difference twice. You only want to difference enough to where the data is finally stationary. As I mentioned before, you can check if your data is stationary using the ADF and the KPSS tests. Here are the equations for differencing below. Notice that third, fourth to the nth order differencing can be applied.

Cisco Prep, Cisco Tutorial and Material, Cisco Learning, Cisco Certification, Cisco Cert Prep

Cisco Prep, Cisco Tutorial and Material, Cisco Learning, Cisco Certification, Cisco Cert Prep

When you apply first order differencing and don’t make any changes to the autoregressive lags or moving average, it is ARIMA(0,1,0), also called a random walk. This means your model is going to generate forecasts without taking into consideration previous data points. Forecasts will be randomly generated.

Cisco Prep, Cisco Tutorial and Material, Cisco Learning, Cisco Certification, Cisco Cert Prep

3) q – order of the moving average lags (MA Part)

Finally, we will talk about the q term. The q term is the moving average part and is applied when you want to look at your prediction error. You will use this error as input for your final forecast at time t. This will be relevant when you are training the data. You can use this parameter to correct some of the mistakes you made in your previous prediction to use for a new prediction. Below is the equation used on the error terms and is the last portion of the general ARIMA equation above:

Cisco Prep, Cisco Tutorial and Material, Cisco Learning, Cisco Certification, Cisco Cert Prep

As you can see below, with ARIMA(0,0,3), the three red data points indicate the window size you will use to help make a prediction on the next point. The next forecasted point from the three red points would be the first data point on the green line.

Cisco Prep, Cisco Tutorial and Material, Cisco Learning, Cisco Certification, Cisco Cert Prep

IX. Measuring if Forecast is good or not

1) Train/Test Splits

Now that you know all the components of ARIMA, I will talk about how to make sure your forecasts are good. When you are training a model, you need to split your data into train and test sets. This is so you can evaluate the test set, as this set of values is not trained during model fit. As opposed to other classical machine learning techniques, in which you can split your data randomly, a time series must be a sequential train-test split. Below is an example of a typical train-test split.

Cisco Prep, Cisco Tutorial and Material, Cisco Learning, Cisco Certification, Cisco Cert Prep

2) Model Forecast Error

After you have finished training your model, you need to know how far your predictions are from the actual values. That is where you introduce error metrics. For the predictions on the test set, you can calculate how far off your predictions were from the actual values using various error metrics. Your goal when making a forecast is to reduce this error as much as possible. This is important as it will tell you how good the forecast is. Additionally, knowing your forecasting error will also help you tune your parameters on the ARIMA model should you want to make changes to the model.

The metrics we generally recommend for time series is mean absolute error and mean absolute percentage error. Mean absolute error (MAE) is a single number, and it tells you on average how far your predictions are from the actual values. Mean average percentage error (MAPE) is another metrics we use, and it is the mean average error expressed as a percentage. This will tell you how “accurate” your model is. The equations for MAE and MAPE are below as well as a plot of Google stock predictions on a train-test split. You can calculate the error on the predictions using the equations below. Notice that you will use the forecasts on the purple line and the red data points to help calculate MAE and MAPE for your test set.

Cisco Prep, Cisco Tutorial and Material, Cisco Learning, Cisco Certification, Cisco Cert Prep

Cisco Prep, Cisco Tutorial and Material, Cisco Learning, Cisco Certification, Cisco Cert Prep

Cisco Prep, Cisco Tutorial and Material, Cisco Learning, Cisco Certification, Cisco Cert Prep

In the example plot above, the line represents your model fit and predictions while the dots represent the actual data. To get the mean absolute error at each time point, you subtract the actual data from the prediction (the points represented as a line in this graph) to get the error. You sum those up and divide by the total number of points. For the forecast on the test set here, the mean absolute error was $72.35, which means on average, each prediction was off by around $72.35. Additionally, the mean absolute percentage error is 5.89%, which tells us that the overall “accuracy” of the model is around 94.11%.

Overview of Steps for tuning ARIMA

Now that you know all of the steps in detail, below I will overview how you want to think about each parameter and steps you would take to train your ARIMA model.

1) Identify the order of differencing, d, using stationarity tests.

2) Identify the order of the autoregression term, p, using ACF plots as rubrics.

3) Identify order of the moving average term, q, using ACF plots as rubrics.

4) Optimize models to minimize error on test data using mean absolute error and mean absolute percentage error after doing a train-test split.

X. Multivariate Forecasting: A Brief Glimpse

Now that you know the basics of tuning ARIMA, I want to mention one more interesting topic. Everything detailed above was in concern of forecasting on one variable. This is called univariate time series. Another important concept arises when you want to predict more than one variable. This is called multivariate forecasting. This will be an important concept that I talk about in Part 3 of the blog series about time series, where I introduce a Cisco use case.

Why would you want to introduce more variables into a time series? There might be a chance that other variables in your dataset might help explain or help predict future values of your target variable. We call these leading indicators. A leading indicator gives a signal after the trend has started and is telling you to pay attention!

For example, let’s say you own an ice cream shop and it is summertime. PG&E cuts off your electricity. You can probably predict that in the future, ice cream sales will go down. You have no electricity to store and make your ice cream in the sweltering heat. The turning off and turning on of electrical power would be a great example of a leading indicator. You can use this indicator to supplement the forecasting of your sales in the future.

There are plenty of Multivariate ARIMA variations, including ARIMAX, SARIMAX, and Vector Autoregression (VAR). I will talk about ARIMAX briefly in the next post.

Cisco Prep, Cisco Tutorial and Material, Cisco Learning, Cisco Certification, Cisco Cert Prep

Saturday, 11 April 2020

Time Series Analysis with ARIMA: Part 1

PART 1: Introduction to Time Series


At Cisco, our partners and clients want ways to track and monitor their Cisco routers, switches, and other such devices. An important avenue of my work as part of the Customer Experience Data Incubation Team is to help track device utilization over time. One such way to think about how device utilization changes over time is to frame it as a time series. In this blog post, I will give a full break down of time series and ARIMA, why it is important, what it is, and how to use it – with a Cisco use case as well! This blog post will give a picture of some of the work the Data Incubation Team has done as part of the Customer Experience portfolio.

I. What is a Time Series?

So, what is a time series? It’s actually a very simple concept. A time series is simply a set of values of the same entity observed over time, typically in equally spaced intervals. It can be monthly, yearly, weekly, daily, hourly, and by the minute. A few of examples of a time series include weekly gas prices, yearly temperature change, hourly passenger count on the subway, and any stock market app you look at. Below is an example of a time series using Google’s Stock. I will use this example for the majority of the blog.

Cisco Cert Exam, Cisco Prep, Cisco Guides, Cisco Learning, Cisco Tutorial and Material, Cisco Analysis

II. Why Do We Care About Time Series?

So why is understanding time series data important? If you want to predict something in the future or understand trends over time, you will want to use time series analysis. For example, maybe you want to track sales and predict sales in the future. Maybe you want to breakdown your sales over time to see if there is a trend or cycle associated with it. Any sort of data tracked over time can be used for time series analysis! Below is another example of time series, which tracks the hourly bicycle count.

Cisco Cert Exam, Cisco Prep, Cisco Guides, Cisco Learning, Cisco Tutorial and Material, Cisco Analysis

III. Components of a Time Series

Now that you know what and why of time series, let’s break down its components. This will be important when we start talking about ARIMA in the next post.

Let’s say you have your observed values, D. These observed values, D, can actually be broken down into 2 main components: Systematic components and Random components. Systematic components are data that can be forecasted, while random components are data that cannot be forecasted. I will break down both the systematic components and random components in a series of definitions below.

◉ Systematic Components, S – Data that can be forecasted. Systematic components can be further broken down into 3 parts.

◉ Level, L – It is the intercept of the straight-line approximation of the current observed values D, like a regression line or line of best fit. Level is generally used as initial input to forecast models.

Cisco Cert Exam, Cisco Prep, Cisco Guides, Cisco Learning, Cisco Tutorial and Material, Cisco Analysis

◉ Trend, T – It is the slope of the rate of growth or decline of your observed values, D. This slope or rate will decline, incline, or be constant throughout the time series.

Cisco Cert Exam, Cisco Prep, Cisco Guides, Cisco Learning, Cisco Tutorial and Material, Cisco Analysis

◉ Seasonality, S or Cycles – They are the predictable seasonal or non-seasonal fluctuations in your observed values, D. In other words, your data has seasonality if the data has variations that occur in regular intervals (weekly, monthly, etc.) throughout a year. For example, Nintendo Switch console prices and games lower every 3 months, then come back up after a week. This is considered a seasonal component.

Cisco Cert Exam, Cisco Prep, Cisco Guides, Cisco Learning, Cisco Tutorial and Material, Cisco Analysis

◉ Random Components, R – This might be anomalous behavior, irregularities in the data, and unexplained variation. These are all things that typically cannot be controlled, and they are inevitable in almost every dataset.

Cisco Cert Exam, Cisco Prep, Cisco Guides, Cisco Learning, Cisco Tutorial and Material, Cisco Analysis

IV. Main Goals when Given Time Series

Now that you know what a time series is and the components, you may be wondering what you can do with it. When given a time series, you either want to decompose the components of your time series data or forecast and make predictions based on your data. Let’s talk about both techniques below.

◉ Decomposition: This is the breakdown of data into sub-components, including trend, seasonality, and randomness and can be done to look at important parts of the time series. Maybe sales on your services have a seasonal or cyclical component to them and you want to use that to improve sales at a certain part of the season. That is where decomposing a time series can be helpful. You can visualize and identify specific factors and trends in your data that impact its growth or decline. Below is a breakdown of the components of Google’s stock.

Cisco Cert Exam, Cisco Prep, Cisco Guides, Cisco Learning, Cisco Tutorial and Material, Cisco Analysis

◉ Forecasting: Another goal of time series is forecasting the future. For example, you may want to predict when some hardware or device might crash in the future based on their historical data. This can help companies make proactive or preventative measures to fix the problem before it happens instead of reacting to the problem as it happens. As a result, this can save time and money for companies and clients. Below is an example of the forecast of Google stocks given its current seasonality, cycles, and trends.

Cisco Cert Exam, Cisco Prep, Cisco Guides, Cisco Learning, Cisco Tutorial and Material, Cisco Analysis

V. Forecasting Rules of Thumb

Now that you understand some of the cool things you can do with time series, I will now go over rules that are critical to know if you want to do forecasts on your data.

Rule #1 – Always plot your data before, during, and after forecasting!

You always want to check how the data is distributed over time or how the model is forecasting by plotting the data. The process is quick and gives an idea on how to approach the problem or make adjustments to the model.

Rule #2 – You can only forecast the systematic components of the observed data – Level, Trend, Seasonality

You may not predict the future very well if you do not see any of those systematic components of trend, seasonality or cycles after decomposing your time series. There may be a promising project you work on that might have uneven and irregular data. For example, maybe the stock price swings if someone sends out an innocuous tweet. You can see the how that tweet impacted your time series by looking at the residuals or the random components. This type of swing may be something you will likely not be able to predict.

Rule #3 – The random components, R, cannot be predicted

As mentioned before, random components are sudden changes occurring in a time series which are unlikely to be repeated. They are components of a time series which cannot be explained by trends, seasonal or cyclic movements and they are usually not repeated. For example, during times of the coronavirus, stock prices were very volatile and while there was a general downward trend, much of the day-to-day activity was random. If your data only have random components, it will be harder for you to make an intelligent time series forecast.

VI. General Forecasting Techniques – Univariate Time Series

Now that you understand some important concepts for forecasting, I will outline two different forecasting techniques used as industry practice today, starting from simple regressions to smoothing.

◉ Regressions find a straight line that best fits the data. This is also known as static forecasting.

1. EX: Least Squares (using linear regression)

Cisco Cert Exam, Cisco Prep, Cisco Guides, Cisco Learning, Cisco Tutorial and Material, Cisco Analysis

◉ Smoothing determines the value for an observation as a combination of surrounding observations. This is also known as adaptive forecasting. ARIMA utilizes smoothing methods. Smoothing has additional tools that a simple regression does not have and makes modeling more robust. Smoothing techniques are more commonly used today, but regressions are often useful to get a general idea of how your data is moving.

1. EX: moving average, exponential smoothing models, ARIMA models

Cisco Cert Exam, Cisco Prep, Cisco Guides, Cisco Learning, Cisco Tutorial and Material, Cisco Analysis