Thursday, 17 February 2022

Cisco MDS 64G SAN Analytics: Architecture evolution

Cisco MDS 64G SAN Analytics, Cisco Exam Prep, Cisco Learning, Cisco Preparation, Cisco Skills, Cisco Jobs

Cisco recently announced software availability of NX-OS 9.2(2) with support for SAN Analytics on the Cisco MDS 9700 Series switches with 64G Modules. This software release begins the next phase in the architecture evolution of SAN Analytics.

In this blog we will do a high-level comparison of SAN Analytics Architecture between the Cisco MDS 32G and 64G platforms and look at some of the new innovations of Cisco MDS 64G SAN Analytics.

But first, let’s cover methodologies used for performance monitoring. Utilization, Saturation and Errors (USE) is a generic methodology for effective performance monitoring of any system. The USE metrics identify performance bottlenecks of a system. In the context of a storage system, we can add Latency as an additional element into the USE methodology to create LUSE. A full visibility into LUSE metrics of a storage infrastructure is critical for performance monitoring and troubleshooting.

SAN Analytics and SAN Insights are advance features of the Cisco MDS 32G switches since NX-OS 8.3(2):

◉ SAN Analytics is an advance feature of Cisco MDS switches that collects storage I/O metrics from switches independent of host and storage systems. Over 70 metrics are collected per-port, per-flow (ITL/ITN) and streamed out. These metrics can be classified into one of the ‘LUSE’ categories.

◉ SAN Insights is a capability of Cisco Nexus Dashboard Fabric Controller (Formerly DCNM) SAN that receives the metrics stream from SAN Analytics. It provides the visualization and analysis of fabric wide I/O metrics using the ‘LUSE’ framework.

Cisco MDS 32G SAN Analytics

Access Control Lists (ACL) enforce access control on every frame switched by the ASIC. The ACLs are matched extracting certain fields from the frame header and on a match the action corresponding to the entry is taken. On an F-port, FC Hard Zoning entries are programmed as ACLs in the ingress direction based on Zoning configuration to match on the frame SID and DID with an action to “forward” the frame to the destination.

On Cisco MDS 32G switches, the I/O metrics are computed by capturing FC frame headers in the data path using an ACL based ‘Tap’ programmed in the ASIC on ingress and egress direction of the analytics enabled ports. These Tap ACLs match on frames of interest for Analytics viz. CMD_IU, 1st DATA_IU, XRDY_IU, RSP_IU and ABTS. A copy of the frame matching the Tap ACL is forwarded to an on-board NPU connected to the 32G ASIC.

When SAN analytics is enabled on a port, the ACLs are programmed depending on the port type and direction as shown in Figure 1 below:

◉ F_Port Ingress: Analytics Tap ACLs + Zoning ACLs

◉ F_Port Egress, E_Port Ingress, E_Port Egress: Analytics Tap ACLs only

Cisco MDS 64G SAN Analytics, Cisco Exam Prep, Cisco Preparation, Cisco Learning, Cisco Skills, Cisco Jobs
Figure 1: Port Analytics Tap and Zoning
 
The Cisco MDS 32G NPU software Analytics Engine can be modified to accommodate custom metrics (Eg: NVMe Flush command metrics) or futuristic storage command sets (Eg: NVMe-KV) with the required ACL Taps in place.

Cisco MDS 64G SAN Analytics


The Analytics Engine moves into the ASIC on Cisco MDS 64G switches, giving it a hardware acceleration. The Cisco MDS 64G Module has two 64G ASICs and each ASIC has six hardware Analytics Engines (one for every four ports). These Analytic Engines can compute I/O metrics at line rate on all ports simultaneously with capacity to analyze upwards of 1 billion IOPS per Module. The hardware Analytics Engines have built-in Taps and do not need the ACL based Taps to be programmed.

The metrics computed by hardware Analytics Engines are stored in a database inside the ASIC and periodically flushed to the NPU. The NPU runs a lightweight software process on top of DPDK (an open source highly efficient and fast packet processing framework) that collects and accumulates the metrics pushed periodically from the hardware Analytics Engine. Even though the NPU does not run an Analytics Engine, it maintains the persistent metrics database per-flow and remains the critical element of the solution. The shipping of metrics from the NPU database to the Supervisor is identical to the Cisco MDS 32G Architecture. The Cisco MDS 64G hardware Analytics Engine does not preclude a NPU software Analytics Engine to be enabled in a future software release for flexibility and programmability benefits.

A comparison of the Cisco MDS 32G and MDS 64G architectures are shown in Figure 2 below:

Cisco MDS 64G SAN Analytics, Cisco Exam Prep, Cisco Preparation, Cisco Learning, Cisco Skills, Cisco Jobs
Figure 2: Cisco MDS 32G and MDS 64G SAN Architectures

The Cisco MDS 64G hardware Analytics Engine computes some additional metrics for deeper I/O visibility:

◉ Multi-sequence write I/Os are large writes involving multiple XRDY sequences. The write exchange completion time for these writes include delays introduced by the Host (Rx XRDYn to Tx first DATAn+1) and the Storage (Rx Last DATAn-1 to Tx XRDYn). These metrics provide better analysis and accurate pinpointing of large write performance issues. The Analytics Engine separately tracks:
    ◉ Avg/Min/Max host write delay
    ◉ Avg/Min/Max storage write delay
◉ The total busy time metric tracks the total time there was at least one outstanding I/O per-flow. This metric helps to characterize the ‘busyness’ of a flow relative to other flows.

The hardware Analytics Engine by default tracks SCSI and NVMe I/O metrics at ITL/ITN granularity. However, it can also be programmed to track metrics for various flow granularity of IT, ITL-VMID, ITN-NVMeConnectionID or ITN-NVMeConnectionID-VMID. This gives flexibility in choosing the granularity of metrics and I/O visibility.

The 1GbE analytics port on the Cisco MDS 64G Module can stream the per-flow metrics directly (without involvement of Supervisor) in an ASIC native or standard gPB/gRPC format. This can serve future use-cases that require visibility into micro telemetry events, which would require high frequency telemetry streaming.

Source: cisco.com

Related Posts

0 comments:

Post a Comment