Home > Storage > PowerMax and VMAX > Storage Admin > Implementing Dell SRDF SRA with VMware SRM > TestFailoverWithoutLocalSnapshots with Cascaded SRDF/Star
Due to the nature of cascaded replication, the process for TestFailoverWithoutLocalSnapshots with Cascaded SRDF/Star is slightly different in workflow depending on whether the target site is the Synchronous site or the Asynchronous site. Each workflow will be discussed separately in the two subsequent sub-sections.
The SRDF supports performing test failover with TestFailoverWithoutLocalSnapshots enabled with Cascaded SRDF/Star with the recovery site at the Synchronous target site. This presumes that the FailoverToAsyncSite option is set to disabled for both instances of the SRDF SRA.
Before a test with either configuration can be executed, both active links of the Cascaded SRDF/Star configuration must be in the “Protected” state. If either link is in a state besides “Protected” the operation will fail.
The overall Star state can be “Protected” or “Unprotected” for test failover but it is highly recommended that the Star state is initially in a “Protected” state. It should also be noted that if the Star state is “Unprotected” before the test failover, the SRA will fully protect Star during the cleanup stage regardless of the initial Star state.
Table 15 chronologically depicts the steps automatically taken by the SRDF SRA for test failover to the Synchronous target site of a Cascaded SRDF/Star configuration[1].
Note: When testing Cascaded SRDF/Star with the TestFailoverWithoutLocalSnapshots mechanism on the Synchronous target site, BOTH links are suspended for the duration of the test. This means that there is NO continuous remote protection until the test has completed and the devices have been re-synchronized. Since the test is running on the Synchronous site, that site will no longer provide a valid copy of the production data until a resynchronization due to changes incurred during the test. The Asynchronous site will contain a valid point-in-time copy but if the workload site is lost during the test any updates committed to the production data during the test will be lost. It is very important to understand this and the possible repercussions before choosing this method of test failover.
Table 14. TestFailoverWithoutLocalSnapshots to Synchronous target site with Cascaded SRDF/Star
Step | Star action | Description |
1 | Disable Star | Before any site can be manipulated the overall Star state must be “Unprotected”. If Star is protected at the start of the test failover, the SRDF SRA will disable Star. |
2 | Unprotect Async site | Disables consistency at the async site. Due to the nature of a Cascaded setup, the async site must be unprotected and disconnected before the Sync site can be isolated. An unprotect operation must occur before the site can be disconnected. |
3 | Disconnect Async site | Suspends RDF replication from the R21 (sync) to the R2 (async) devices. If this action was not performed, any changes to the sync site during the test would be propagated to the async site leaving no valid point-in-time copies of the R1 device. |
4 | Isolate Sync site | Disables consistency and suspends RDF replication from the R1 (workload) devices to the R21 (Sync) and sets the R21 devices to read/write enabled. |
Figure 84 shows a diagram of the final state of a Cascaded SRDF/Star environment when a test failover operation has completed and before a SRM Cleanup operation is run.
Figure 84. TestFailoverWithoutLocalSnapshots enabled for Cascaded SRDF/Star to the Sync target site
Figure 84 shows that the recovered test VMware environment is at the Synchronous site and consequently the devices in that location are read/write enabled. The asynchronous target site devices are also disconnected but remain in the write disabled state assuring at least there are valid point-in-time copies of the R1 workload devices that persist throughout the test.
Once the test failover has completed and all desired application tests are verified, the test failover can be terminated and reverted by clicking the Cleanup operation button.
Table 16 chronologically depicts the steps taken by the SRDF SRA for the test failover cleanup operation for the Synchronous target site of a Cascaded SRDF/Star configuration.
Table 16. TestFailoverWithoutLocalSnapshots Cleanup operation with the Synchronous target site with Cascaded SRDF/Star
Step | Star action | Description |
1 | Disconnect Sync site | Star requires that a site is in the disconnected state before it can be reconnected and replication be re-established. Therefore a disconnect operation must be run to transition the sync site from a state of isolated. |
2 | Connect Sync site | Incrementally re-establish replication to Sync site from the workload site. |
3 | Connect Async site | Incrementally re-establish replication from the Sync site to the Async site. |
4 | Protect Sync site | Re-activate consistency protection to the Sync site. |
5 | Protect Async site | Re-activate consistency protection to the Async site. |
6 | Enable Star | Re-enables consistency protection and 3-site recoverability across the three sites. |
The SRDF SRA supports performing test failover with the TestFailoverWithoutLocalSnapshots behavior with Cascaded SRDF/Star with the recovery site at the Asynchronous target site. This presumes that the FailoverToAsyncSite option is set to enabled for both instances of the SRDF SRA.
Before a test can be executed, both active links of the Cascaded SRDF/Star configuration must be in the “Protected” state. If either link is in a state besides “Protected” the operation will fail.
The overall Star state can be “Protected” or “Unprotected” for test failover but it is highly recommended that the Star state be in a “Protected” state prior to the test failover. It should also be noted that if the Star state is “Unprotected” before the test failover, the SRA will fully protect Star during the cleanup stage regardless of the initial Star state.
Table 17 chronologically depicts the steps taken by the SRDF SRA for the test failover.
Table 17. TestFailoverWithoutLocalSnapshots to Asynchronous target site with Cascaded SRDF/Star
Step | Star action | Description |
1 | Disable Star | Before any site can be manipulated the overall Star state must be “Unprotected”. If Star is protected at the start of the test failover, the SRDF SRA will disable Star. |
2 | Isolate Async site | Disables consistency protection and suspends RDF replication from the R21 (sync) devices to the R2 (async) and sets the R2 devices to read/write enabled. |
Figure 85 shows a diagram of the final state of a Cascaded SRDF/Star environment when a test failover operation has completed and before a SRM Cleanup operation is run.
Figure 85. TestFailoverWithoutLocalSnapshots enabled for Cascaded SRDF/Star to the Async target site
Figure 85 shows that the recovered test VMware environment is at the Asynchronous site and that the devices in that location are read/write enabled. The Synchronous target site devices remain synchronized and consistent with the workload site without interruption throughout the test procedure.
Once the test failover has completed and all desired application tests are verified, the test failover can be terminated and reverted by clicking the Cleanup operation button.
Table 18 chronologically depicts the steps taken by the SRDF SRA for the test failover cleanup operation for the Asynchronous target site of a Cascaded SRDF/Star configuration.
Table 18. TestFailoverWithoutLocalSnapshots Cleanup operation with the Asynchronous target site with Cascaded SRDF/Star
Step | Star action | Description |
1 | Disconnect Async site | Star requires that a site is in the disconnected state before it can be reconnected and replication be re-established. Therefore a disconnect operation must be run to transition the async site from a state of isolated. |
2 | Connect Async site | Incrementally re-establish replication from the Sync site to the Async site. |
3 | Protect Async site | Re-activate consistency protection to the Async site. |
4 | Enable Star | Re-enables consistency protection and 3-site recoverability across the three sites. |
The ability to execute a test failover using TestFailoverWithoutLocalSnapshots enabled is also available for 3-site Non-Star configurations.
TestFailoverWithoutLocalSnapshots has the same requirements as test failover with TimeFinder:
If the Concurrent SRDF configuration is valid, the SRDF SRA will proceed with the RDF Split operation on the replication between the Workload site and the Asynchronous site. The Split operation will suspend replication and make the target devices Read/Write enabled to the hosts. An example of a Concurrent SRDF environment after the test failover operation but before the Cleanup operation is depicted in the diagram in Figure 86. This example is for a Sync/Async environment but for other configurations the same operations are executed and the workflow is no different.
Figure 86. TestFailoverWithoutLocalSnapshots and Concurrent SRDF
Once the test failover has completed and all desired application tests are verified, the test failover can be terminated and reverted by clicking the Cleanup operation button.
The Cleanup operation consists of performing an incremental establish on the target devices which will revert them to their write-disabled state and overwrite any invalid tracks on the R2 devices.
The SRDF SRA supports the TestFailoverWithoutLocalSnapshots option in Non-Star 3-site Cascaded SRDF.
TestFailoverWithoutLocalSnapshots has the same requirements as test failover with TimeFinder:
Note: When using TestFailoverWithoutLocalSnapshots to failover to the secondary site (usually the Synchronous site), the SRA does not suspend the replication from the secondary site to the tertiary one. Therefore changed tracks on the secondary site during the test will be propagated to the tertiary site. This will make both sites invalid copies of the R1 for the duration of the test. Therefore, it is recommended to take a manual, targetless gold copy using TimeFinder of one of secondary or tertiary sites prior to test recovery. This will provide at the very least a valid point-in-time copy of the R1 data during the test in case of a simultaneous failure at the production site.
If the Cascaded SRDF configuration is valid, the SRDF SRA will proceed with the RDF Split operation on the target site. The Split operation will suspend replication and make the target devices read/write enabled to the hosts. An example of a Cascaded SRDF environment after the test failover operation when the target is the tertiary site is depicted in the diagram in Figure 87.
Figure 87. TestFailoverWithoutLocalSnapshots and Cascaded SRDF
Once the test failover has completed and all desired application tests are verified, the test failover can be terminated and reverted by clicking the Cleanup operation button.
The Cleanup operation consists of performing an incremental establish on the target devices which will revert them to their write-disabled state and overwrite any invalid tracks on them.
The SRDF SRA allows the options, TestFailoverWithoutLocalSnapshots and IgnoreActivatedSnapshots to be simultaneously enabled. IgnoreActivatedSnapshots is discussed in detail in the previous section in this chapter, IgnoreActivatedSnapshots. IgnoreActivatedSnapshots allows the user to prepare the recovery devices manually instead of having the SRA do it. In the case of TestFailoverWithoutLocalSnapshots, this means the user can split the RDF links before the test failover operation begins and, if desired, resume the replication before the SRM cleanup operation.
When both of these options are enabled, the SRA will examine the R2 devices to ensure that they have been split beforehand. If the devices have been split, then the test failover operation will skip the RDF split operation and move on to recovering the devices. If only some of the devices in the recovery plan are split and other R2 devices are still write disabled and the RDF pair states are still Synchronized or Consistent, the SRA will skip over the devices that are split and only split the ones that have not been split.
In the case of 2-site RDF configurations or 3-site Non-Star configurations the RDF command symrdf split is the only required operation on the relevant RDF pairs. In the case of SRDF/Star the operation is slightly different. For SRDF/Star, the SRA expects the state of the Star configuration to be disabled and that the target site be in the “isolated” state. Refer to the previous relevant section for a given SRDF configuration for the particular required states. The steps issued by the SRDF SRA should be exactly followed by the user.
After the test failover has been completed and before the cleanup operation is initiated, the user can either resume replication themselves or let the SRA resume replication automatically during cleanup. It is generally recommended to let the SRA resume replication as it will ensure all of the proper steps are followed for cleaning up the environment. If manual resynchronization is desired before the cleanup, 2-site RDF pairs or 3-site Non-Star configurations should be returned to the SyncInProg, Synchronized, or Consistent state. SRDF/Star requires that the target site be disconnected, reconnected, protected and Star re-enabled so that Star is back in the overall protected state.
[1] In the SRDF SRA log, operations involving the different sites are referred to as site1 or site2. These refer to the sync site and the async site respectively.
[2] The replication can be split beforehand if IgnoreActivatedSnapshots is enabled.