Home > Storage > PowerFlex > White Papers > Dell PowerFlex: Introduction to Replication > Failure considerations
Servers, processes, and network links periodically fail, so we performed tests related to these types of failures. In our tests, we used a PowerFlex R740xd 6-node cluster with three SSDs per storage pool. Replication was active on both storage pools at the time of the failures.
We started with a baseline workload, as shown in the following figure:
We went on to fail an SDR, observe the impact, and observe the later impact of restarting it. The following figure shows the results:
Immediately after failing the SDR, we see a drop in I/O processing. | The I/O resumes slightly lower. | After restarting, the I/O is slightly affected, but eventually ramps back up to the baseline. |
We performed the same test for SDS failure. The results show that the system was more than capable of handling the workload with five active SDS systems:
Baseline workload. | IOPS just after failing the SDS. | 5 seconds later, I/O starts ramping up, and within 10 seconds the baseline workload resumes. |
Also, as expected, we saw rebalance activity:
The following figure shows performance after SDS recovery:
When we restart the failed SDS, we see an immediate, but insubstantial, drop. | As the rebalance continues, the I/O ramps back up to the baseline. |
Next, we failed a network link to demonstrate how the updated native load balancing affects the I/O rate, as shown in the following figure. The system has a network configuration consisting of four data links between systems.
Again, we establish a baseline. | We fail a link and notice a 3-second drop in I/O. | 5 seconds later, the baseline returns. |
After we reconnected the failed port, the baseline I/O level resumed within a few seconds with no noticeable dip:
All these failure scenarios demonstrate the resilience of PowerFlex. They also show that the system is well tuned and that rebuild activity does not have a severe impact on our workload.