Once a recovery plan has been tested, the test environment can be discarded and reset by using the cleanup operation offered by SRM. The Cleanup operation automatically reverts all changes incurred by the recovery plan test and allows for subsequent failover operations.
The Cleanup operation performs the following operations:
- Power off and unregister test virtual machines.
- Unmount and detach snapshot VMFS volumes or RDMs.
- Replace recovered virtual machines with original placeholders (shadow VMs), preserving their identity and configuration information.
- PowerFlex storage snapshots that were used by the recovered virtual machines during the test are removed, and thus any changes made during the test itself are discarded.
Before resetting the environment after a test failover, ensure that the recovery plan worked. Verify the success of any custom scripts, application functionality, networking, and so on. Once all facets of the test have been verified by the involved stakeholders, a Cleanup operation can be started.
As shown in the figure below, a cleanup can only be run against a recovery plan if the recovery plan status is in Test complete. Otherwise, the CLEANUP button is disabled. Furthermore, even if a test failover was not entirely successful, a cleanup operation still has to be run before another test failover can be attempted.
The Test complete status is assigned to the Recovery Plan regardless of the level of success reached by the test failover. For example, as shown in the figure below the test failed, but the plan status is still Test complete.
The cleanup process is initiated, in a similar fashion to the test failover process, by clicking the CLEANUP link after selecting the appropriate Recovery Plan. This is shown in the figure below.
The CLEANUP link launches a similar set of windows that the original test operation brought up to confirm the reset activities that it runs. The first attempt at running this cleanup after a particular failover offers no configurable parameters and displays details for confirmation. This set of screens are shown in the figure below.
The figure below shows the steps taken by the cleanup process itself.
Due to various reasons, the first attempt at a cleanup operation may fail. Typical causes include:
- Test failover did not complete successfully.
- Storage snapshot creation failure.
- Virtual machine inventory mappings are incorrect.
- Environment change after the test failover but before Cleanup.
- Manual change to storage outside of SRA.
- Significant protection group change.
- VMware environment failure.
- Manual change to VMware environment outside of SRM.
In cases such as these, the first cleanup operation, which does not permit the use of force, fails. This is because on the first run, the cleanup operation does not tolerate any failures with any step of the cleanup process. Therefore, if the cleanup process encounters an error, it immediately fails as shown in the following figure.
Note that the Force cleanup is disabled during the first run.
Once the cleanup process has failed for the first time, the ability to force the cleanup becomes available. The cleanup confirmation wizard, when run after a failure, offers a checkbox to force the cleanup as shown in the following figure. This alters the behavior of the cleanup process to ride through any error encountered. Any operation that can be completed successfully is completed and, unlike before, any operation that encounters an error is skipped.
In general, it is not advisable to resort to the force cleanup unless an actual failover operation has to be run immediately and the time to troubleshoot any issues encountered in the cleanup cannot be afforded. Otherwise, before using the force option, attempt to resolve any issues first and then retry a non-forced cleanup again. If a force cleanup is used in haste, it may require additional manual intervention afterwards. PowerFlex and SRM may not be capable of making themselves ready for another test failover or failover without user intervention.
When a force cleanup is run, users should see the logs to identify the exact errors encountered. If necessary, resolve these issues and attempt to run another test failover as soon as possible to verify the environment is functioning correctly. A force cleanup is common if the PowerFlex UI is being used with SRM for Test Failover. As previously noted, this should be avoided.