Servers, processes, and network links periodically fail, so we performed tests related to these types of failures. In our tests, we used a PowerFlex R740xd 6-node cluster with three SSDs per storage pool. Replication was active on both storage pools at the time of the failures.
We started with a baseline workload, as shown in the following figure:
We went on to fail an SDR, observe the impact, and observe the later impact of restarting it. The following figure shows the results:
We performed the same test for SDS failure. The results show that the system was more than capable of handling the workload with five active SDS systems:
Figure 25. Performance with SDS failure
Also, as expected, we saw rebalance activity:
The following figure shows performance after SDS recovery:
Figure 27. Performance after SDS recovery
Next, we failed a network link to demonstrate how the updated native load balancing affects the I/O rate, as shown in the following figure. The system has a network configuration consisting of four data links between systems.
Figure 28. Performance during network link failure
After we reconnected the failed port, the baseline I/O level resumed within a few seconds with no noticeable dip:
Figure 29. Performance after reconnection of failed port
All these failure scenarios demonstrate the resilience of PowerFlex. They also show that the system is well tuned and that rebuild activity does not have a severe impact on our workload.