Reprotection

Thank you for your feedback!

After a recovery plan has run, there are often cases where the environment must continue to be protected against failure to ensure its resilience and to meet objectives for disaster recovery. SRM offers reprotection which is an extension to recovery plan management that enables the environment at the recovery site to establish replication and protection of the environment back to the original protected site. This behavior allows users to recover the environment quickly and easily back to the original site if necessary.
It is important to note that a unassisted reprotection by SRM may not always be possible depending on the circumstances and results of the preceding recovery operation. Recovery plans run in planned migration mode are the likeliest candidates for a subsequent successful automated reprotection by SRM. Exceptions to this occur if certain failures or changes have occurred between the time the recovery plan was run and the reprotection operation was initiated. Those situations may cause the reprotection to fail. Similarly, if a recovery plan was run in disaster recovery mode, any persisting failures may cause a partial or complete failure of a reprotection of a recovery plan.
These different situations are described in the following sections.
Reprotect after Planned Migration
The scenario leading to a successful reprotection is one after a planned migration. In the case of a planned migration there are no failures in either the storage or compute environment that preceded the recovery operation. Therefore, reversing recovery plans/protections groups and swapping and establishing replication in the reverse direction is possible.
If failed-over virtual machines eventually need to be returned to the original site or if they require PowerFlex replication protection, it is recommended to run a reprotect operation as soon as possible after a migration.
Reprotect is only available after a recovery operation has occurred, which is indicated by the recovery plan being in the Recovery complete state. Later versions of SRM warn the user about Reprotect needed as shown in Figure 91.
Figure 91. SRM warns the user to run reprotect
A reprotect can be run by selecting the appropriate recovery plan and then selecting the REPROTECT links as shown in Figure 92.
Figure 92. Performing a reprotect operation in SRM
The reprotect operation does the following things:
- Reverses protection groups. The protection groups are deleted on the original protection SRM server and are recreated on the original recovery SRM server. The inventory mappings are configured (assuming the user has preconfigured them in SRM on the recovery site) and the necessary shadow or placeholder VMs are created and registered on the newly designated recovery SRM server.
- Reverses recovery plan. The failed-over recovery plan is deleted on the original recovery SRM server and recreated with the newly reversed protection group.
- Swaps personality of PowerFlex RCG pairs. The PowerFlex SRA performs a swap on the target pairs which enables replication to be established back to the original site. Target becomes source and vice versa.
- Reestablishes replication. After the swap, the PowerFlex SRA incrementally re-establishes replication between the RCG pairs, but in the opposite direction from what it was before the failover/migration.
The PowerFlex GUI events log records the reversal of replication. An example is shown in Figure 93.
Figure 93. Reverse replication event in PowerFlex GUI
Figure 90 shows the steps involved in a reprotect operation.Note: If the command syncOnce fails during reprotect, the process still completes successfully.
Figure 94. Reprotect operation steps
Reprotect after a temporary failure
The previous section describes the best possible scenario for a smooth reprotection because it follows a planned migration where no errors are encountered. For recovery plans failed over in disaster recovery mode, this may not be the case.
Disaster recovery mode allows for failures ranging from a small to a full site failure of the protection data center. If these failures are temporary and recoverable a fully successful reprotection may be possible once those failures have been rectified. In this case, a reprotection behaves similar to the scenario described in the previous section. If a reprotection is run before the failures are corrected or certain failures cannot be fully recovered, an incomplete reprotection operation occurs. This section describes this scenario.
For reprotect to be available, the following steps must first occur:
- A recovery must be run with all the steps finishing successfully. If there are any errors during the recovery, the user needs to resolve the issues that caused the errors and then re-run the recovery.
- The original site should be available and the SRM servers at both sites should be in a connected state. If the sites are disconnected, reprotect fails immediately as shown in Figure 95. If the original site cannot be restored (for example, if a physical catastrophe destroys the original site) automated reprotection cannot be run and manual recreation is required if and when the original protected site is rebuilt.
Figure 95. Reprotect fails with sites disconnected
If the protected site SRM server was disconnected during failover and is reconnected later. The SRM wants to retry certain recovery operations before allowing reprotect. This typically occurs if the recovery plan was not able to connect to the protected side vCenter server and power off the virtual machines due to network connectivity issues. If network connectivity is restored after the recovery plan was failed over. The SRM detects this situation and requires the recovery plan to be re-run in order to power those VMs down.
A reprotection operation fails if it encounters any errors the first time it runs. If so, the reprotect must be run a second time but with the Force cleanup option selected as shown in Figure 96.
Figure 96. Forcing a reprotect operation
Once the force option is selected, any errors are acknowledged and reported but ignored. This allows the reprotect operation to continue even if the operation has experienced errors. All the possible steps are attempted and completed. Therefore, in certain situations, the PowerFlex replication may not be properly reversed even though the recovery plan and protection groups were. If the Configure Storage to Reverse Direction step fails, manual user intervention with PowerFlex GUI or CLI may be required to complete the process. The user should ensure that:
- A source/target swap has occurred by ensuring the replicated target/source devices have changed personalities.
- Asynchronous replication has been re-established.
If a temporary storage failure or replication partition happens, it is likely that manual intervention is required prior to performing a reprotect operation. In this situation, the source devices may not have been unmounted.
Reprotect after a failover due to unrecoverable failure
In extreme circumstances, the storage and/or the compute environment may be rendered completely unrecoverable due to a disaster. In this scenario, reprotect might not be possible. Therefore, the process of reprotection of the original recovery site is no different than the original setup of the protection groups and recovery plans from scratch. An example of an unrecoverable failure would be if the protection site array was lost and then replaced, requiring new RCG pair relationships.

Your Browser is Out of Date

Reprotection

Reprotection

Reprotect after Planned Migration

Reprotect after a failover due to unrecoverable failure