Home > Storage > PowerFlex > White Papers > Implementing Dell PowerFlex SRA with VMware Site Recovery Manager > Test failover workflow in VMware SRM
Test failover is a two-part process:
This section covers the general workflow of the test failover operation.
Before a test failover can be run the following requirements must be met:
Once all required configurations are complete, the user can perform a test failover.
At a high level, a recovery plan test involves the following:
Once the user has confirmed the test failover operation can proceed, the recovery plan is initiated in the test mode.
A completed test recovery is shown in Figure 73. The test environment remains operational until a CLEANUP operation has been started.
The temporary snapshot, created by PowerFlex for the test failover, is shown in Figure 74. This snapshot is mounted to the target volume by a pointer redirection and used in the test. Note that the Access Mode is set to Read and Write for testing purposes.
The RCG status, by necessity, is now in a State of Failover Test as shown in Figure 75. It remains in this state until a CLEANUP operation is started.
Because SRM is using the Test Failover functionality in PowerFlex, it is important to understand the interaction between the two software. The following section describes this interaction.
PowerFlex offers the ability to run a Test Failover from within the GUI interface, as shown in Figure 76, and CLI.
Once a recovery plan has been tested, the test environment can be discarded and reset by using the cleanup operation offered by SRM. The Cleanup operation automatically reverts all changes incurred by the recovery plan test and allows for subsequent failover operations.
The Cleanup operation performs the following operations:
Before resetting the environment after a test failover, ensure that the recovery plan worked. Verify the success of any custom scripts, application functionality, networking, and so on. Once all facets of the test have been verified by the involved stakeholders, a Cleanup operation can be started.
Note: After a test failover has been run, an actual failover or another test failover cannot be run until a cleanup operation has occurred. It is advisable to run a cleanup operation when the test environment is no longer needed to allow for any subsequent operations to be run without delay. In addition, the replication updates on the source journals cannot be sent over until the cleanup completes.
As shown in Figure 77, a cleanup can only be run against a recovery plan if the recovery plan status is in Test complete. Otherwise, the CLEANUP button is disabled. Furthermore, even if a test failover was not entirely successful, a cleanup operation still needs to be run before another test failover can be attempted.
The Test complete status is assigned to the Recovery Plan regardless of the level of success reached by the test failover. For example, as shown in Figure 78, the test failed, but the plan status is still Test complete.
The cleanup process is initiated, in a similar fashion to the test failover process, by clicking the CLEANUP link after selecting the appropriate Recovery Plan. This is shown in Figure 79.
The CLEANUP link launches a similar set of windows that the original test operation brought up to confirm the reset activities that it runs. The first attempt at running this cleanup after a particular failover offers no configurable parameters and simply displays details for confirmation. This set of screens are shown in Figure 80.
Figure 81 shows the steps taken by the cleanup process itself.
Due to various reasons, the first attempt at a cleanup operation may fail. Typical causes include:
Note: Errors reported in the SRM interface can often be generic. See the PowerFlex logs on the recovery site if the error indicates that a failure is related to storage operations.
In cases such as these, the first cleanup operation, which does not permit the use of force, fails. This is due to the fact that on the first run, the cleanup operation does not tolerate any failures with any step of the cleanup process. Therefore, if the cleanup process encounters an error, it immediately fails as shown in Figure 82.
Note that the Force cleanup is disabled during the first run.
Once the cleanup process has failed for the first time, the ability to force the cleanup becomes available. The cleanup confirmation wizard, when run after a failure, offers a checkbox to force the cleanup as shown in Figure 83. This alters the behavior of the cleanup process to ride through any error encountered. Any operation that can be completed successfully is completed and, unlike before, any operation that encounters an error is skipped.
In general, it is not advisable to resort to the force cleanup unless an actual failover operation needs to be run immediately and the time to troubleshoot any issues encountered in the cleanup cannot be afforded. Otherwise, before using the force option, attempt to resolve any issues first and then retry a non-forced cleanup again. If a force cleanup is used in haste, it may require additional manual intervention afterwards. PowerFlex and/or SRM may not be capable of making themselves ready for another test failover or failover without user intervention. When a force cleanup is run, users should see the logs to identify the exact errors encountered. If necessary, resolve these issues and attempt to run another test failover as soon as possible to verify the environment is functioning correctly. A force cleanup is common if the PowerFlex GUI is being used with SRM for Test Failover. As previously noted, this should be avoided.