Controlling SRDF states with Ansible

Thank you for your feedback!

An Ansible playbook is an ideal control tool for manipulating the state of your remote replication. Not only can you control the storage operations, but you can follow on and run the operating system specific tasks and complete a full application recovery in an automated workflow. The workflow is repeatable every time, eliminating human error.
Because the PowerMax modules for Ansible use the PowerMax REST API, which is configurable with Role Based Access Control, users can be restricted to perform only certain functions. You can delegate control of application failover and recovery to application owners by providing an account or enable their LDAP user with minimal privileges on a subset of storage groups across PowerMax arrays. For details about configuring user roles, see the Unisphere for PowerMax configuration guide.
Failing over an SRDF configuration with Ansible
Initiating a failover operation is a simple task in Ansible, resulting in:
- A suspension of data flow from production to the remote site
- The storage volumes (R1) on the source array becoming Read Write disabled
- The Target Volume (R2) becoming Read Write Enabled and visible to host at the remote site
Once the failover is complete, the application can be resumed at the remote site. Figure 37 illustrates a failed over configuration.
Figure 37.    SRDF failover and volume state
Figure 38 shows a sample Ansible task to initiate a failover operation. The full playbook is here. Any operating-system-specific tasks, such as rescanning HBA and mounting file systems, require additional tasks.
Figure 38.    Ansible task to failover a storage group to a remote site
Best Practice: Configure two sets of variables—one for use with a planned and one for use with an unplanned failover. In the case of an unplanned failover, serial_no and remote_serial_no will be reversed if the production site is unavailable. Taking snapshots prior to mounting volumes at the remote site is also recommended. This provides a failsafe in case of an operational issue that could lead to data corruption resulting from human error.
Returning to normal operations after an outage
After the outage has been resolved and it has been determined that it is now time to failback operations, can run a playbook to return the environment to the original state.
Best Practice: As with failover, when preparing for a failback, take a snapshot of your data at both production sites. The snapshot provides a failsafe in case of an operational issue that could lead to data corruption resulting from human error.
Figure 39 shows a sample task for failing back.
Figure 39.    Failback task for an Ansible playbook
Failing back to the primary data center is not always done immediately after the issues that led to a failover have been resolved. It might be necessary arrange a change window to do a planned failback to the original primary site. In this scenario, a swap operation followed by an establish operation can be performed from the remote site to the original source.
To perform these operations in Ansible, create two tasks, using ‘Swap’ and then ‘Establish’ in srdf_state, as shown in Figure 40. We recommend setting wait_for_completion to true for the first of these operations because they are dependent on each other. These operations will make the array at Site B the source (R1) and the array at Site A the target (R2), resuming replication of changes from Site B to Site A and providing the same RPO that existed before the outage.
Figure 40.    Playbook tasks for swap and establish operations
Figure 41.    SRDF and volume state after swap and establish operations
Adding volumes to multi-site SRDF environments (three sites)
Adding volumes to a multi-site SRDF configuration is straightforward but not directly supported in the Ansible modules (at the time of publishing). However, Unisphere for PowerMax REST API has full control over these environments, and the native API calls can be used to supplement the functionality of the Ansible modules. Use the built -n URI module from Ansible and a PUT call to the PowerMax API to modify the storage group volumes.
The Unisphere for PowerMax API documentation on the Dell Technologies developer site provides full documentation for the PUT call used in this example.
Note: Currently, you must remove the Metro DR environment temporarily, which can be a quick operation, before modifying the volume. Replication is never suspended, and RPO is not compromised at any point during the operations. You can create additional tasks to remove the Metro DR environment and revert to the Metro DR environment. The help documentation for the Metro DR module provides examples of these tasks.
Figure 42.    Task to add device to a three-site SRDF configuration using URI module and native API call

Your Browser is Out of Date

Controlling SRDF states with Ansible

Controlling SRDF states with Ansible

Failing over an SRDF configuration with Ansible

Returning to normal operations after an outage

Adding volumes to multi-site SRDF environments (three sites)