Home > Storage > PowerMax and VMAX > Storage Admin > Dell PowerMax and VMware vSphere Configuration Guide > Business continuity solutions for VMware vSphere
Business continuity solutions for a production environment with the VMware virtualization platform is not overly complicated. In addition to, or in place of, a tape-based disaster recovery solution, Dell SRDF can be used as the mechanism to replicate data from the production data center to the remote data center. The copy of the data in the remote data center can be presented to a VMware ESXi cluster. The vSphere environment at the remote data center thus provides a business continuity solution.
Dell replication technologies can generate a restartable or recoverable copy of the data. The difference between the two types of copies can be confusing; a clear understanding of the differences is critical to ensure that the recovery goals for a vSphere environment can be met.
A recoverable copy of the data is one in which the application (if it supports it) can apply logs and roll the data forward to a point in time after the copy was created. The recoverable copy is most relevant in the database realm where database administrators use it frequently to create backup copies of database, e.g., Oracle. In the event of a failure of the database, the ability to recover the database not only to a point in time when the last backup was taken, but also to roll-forward subsequent transactions up to the point of failure, is critical to most business applications. Without that capability, in an event of a failure, there will be an unacceptable loss of all transactions that occurred since the last backup.
Creating recoverable images of applications running inside virtual machines using Dell replication technology requires that the application or the virtual machine be shut down when it is copied. A recoverable copy of an application can also be created if the application supports a mechanism to suspend writes when the copy of the data is created. Most database vendors provide functionality in their RDBMS engine to suspend writes. This functionality has to be invoked inside the virtual machine when Dell technology is deployed to ensure a recoverable copy of the data is generated on the target devices.
If a copy of a running virtual machine is created using Dell Consistency technology[1] without any action inside the virtual machines, the copy is a crash-consistent, restartable image of the virtual machine. This means that when the data is used on cloned virtual machines, the operating system and/or the application goes into crash recovery. The exact implications of crash recovery in a virtual machine depend on the application that the machine supports:
Most applications and databases cannot perform roll-forward recovery from a restartable copy of the data.[2] Therefore a restartable copy of data created from a virtual machine that is running a database engine is inappropriate for performing backups. However, applications that use flat files or virtual machines that act as file servers can be backed up from a restartable copy of the data. This is possible since none of the file systems provide a roll-forward logging mechanism that enables recovery.
Synchronous SRDF (SRDF/S) is a method of replicating production data changes from locations less than 200 km apart. Synchronous replication takes writes that are inbound to the source PowerMax and copies them to the target PowerMax. The resources of the storage arrays are exclusively used for the copy. The write operation from the virtual machine is not acknowledged back to the host until both
PowerMax arrays have a copy of the data in their cache. Readers should consult https://www.dell.com/support/home/en-us for further information about SRDF/S.
Figure 192 is a schematic representation of the business continuity solution that integrates a VMware environment and SRDF technology. The solution shows two virtual machines accessing devices on PowerMax storage arrays on the production site. The boot virtual devices (vmdks) are stored on VMFS, while the data LUNs are RDMs. On the disaster recovery site (DR), those same devices are read-only and in the event of a failure the VMs in the schematic would be registered and powered on, hence their ghostly placeholder status. However, while there is no disaster, the resources at the DR site are utilized as a test environment. This is accomplished through TimeFinder/SnapVX. SnapVX snapshots of the R2 read-only devices are taken and then linked to read-write devices which are presented to the VMware environment at the DR site. The VMs are registered and as long as there is no disaster, the compute and array resources do not remain idle and serve as the test environment.
Figure 192, Business continuity solution using SRDF/S in a VMware environment
Note: As most customers present R2s to the DR site, be sure to review Detaching the remote volume.
SRDF/A, or asynchronous SRDF, is a method of replicating production data changes from one PowerMax to another using delta set technology. Delta sets are the collection of changed blocks grouped together by a time interval that can be configured at the source site. The default time interval is 30 seconds. The delta sets are then transmitted from the source site to the target site in the order they were created. SRDF/A preserves the dependent-write consistency of the database at all times at the remote site. Further details about SRDF/A can be obtained on https://www.dell.com/support/home/en-us.
The distance between the source and target PowerMax is unlimited and there is no host impact. Writes are acknowledged immediately when they hit the cache of the source PowerMax. The basic process is included in the following steps:
The multi-cycle mode on the PowerMax employs a queuing mechanism that allows for smaller delta sets to be transferred, reducing RPO. Multi-mode representation is in Figure 193.
Figure 193. SRDF/A multimode
Before the asynchronous mode of SRDF can be established, an initial copy of the production data has to occur. In other words, a baseline of all the volumes that are going to participate in the asynchronous replication must be executed first. This is usually accomplished using the adaptive-copy mode of SRDF which is more efficient for this initial bulk-load. Once this completes, the mode is changed to asynchronous.
SRDF/Metro is a feature that provides active/active access to the R1 and R2 of an SRDF configuration. In traditional SRDF, R1 devices are Read/Write accessible while R2 devices are Read Only/Write Disabled. In SRDF/Metro configurations both the R1 and R2 are Read/Write accessible. The way this is accomplished is the R2 takes on the personality of the R1 in terms of geometry and most importantly the WWN. By sharing a WWN, the R1 and R2 appear as a shared virtual device across the two arrays for host presentation. A host or typically multiple hosts in a cluster can read and write to both the R1 and R2. SRDF/Metro ensures that each copy remains current and consistent and addresses any write conflicts which might arise. In VMware environments, the ESXi hosts from two different data centers can be placed in the same vCenter, forming a VMware Metro Storage Cluster (vMSC). A simple diagram of the feature is available in Figure 194.
Figure 194. VMware Metro Storage Cluster with SRDF/Metro
Figure 195 is a schematic representation of the business continuity solution that integrates a VMware environment and SRDF/Metro. The solution shows two virtual machines accessing devices on PowerMax arrays on VMFS.
Figure 195. Business continuity solution using SRDF/Metro in a VMware environment with VMFS
In an SRDF/Metro environment with vMSC, all the ESXi hosts see the same VMFS datastore, though some hosts use the R1 and others the R2. The image above is using the R2 only as HA. VMs can be moved through vMotion across data centers, rather than Storage vMotion, despite the use of two separate arrays.
SRDF/Metro supports 3-site configurations with a leg off either the R1, R2 or both members of a pair. This means that neither the R1 nor R2 is aware of the existence of an asynchronous or adaptive copy leg off the other. Up until PowerMaxOS 5978 Q3 2020, SRDF/Metro did not support a Star-like configuration where either the R1 or R2 could update a single remote leg. Beginning with PowerMaxOS 5978 Q3 2020, Dell offers SRDF/Metro Smart DR which enhances SRDF/Metro and provides Star-like functionality. The SRDF/Metro Smart DR feature extends the high availability (HA) solution of SRDF/Metro to a third array and supports the ability to have Geo distance DR support for a device in a Metro configuration.
SRDF/Metro Smart DR integrates SRDF/Metro and SRDF/A enabling HR DR by closely coupling the SRDF/A sessions on each side of the Metro pair to replicate to a single DR device. While only one side of the SRDF/Metro pair (R1) will update the SRDF/A leg at one time, both are capable of doing so. The asynchronous replication of data from either side of the SRDF/Metro pair to a tertiary site enables Failover/Failback to the DR site while retaining the Metro environment. SRDF/Metro Smart DR requires a minimum of PowerMaxOS 5978 Q3 2020 and Solutions Enabler 9.2 running on the three PowerMax arrays with Witness configured. Bias is not supported.
Note: In Unisphere for PowerMax, SRDF/Metro Smart DR will also be referred to as MetroDR.
Figure 196 is a schematic representation of the business continuity solution that integrates a VMware environment and SRDF/Metro Smart DR. The solution shows VMware Metro Storage Cluster (vMSC) with a third SRDF/A site where a SnapVX snapshot is taken of the remote R2 and mounted in a test environment. That same test environment could be used for DR if the primary SRDF/Metro became unavailable.
Figure 196. Business continuity solution using SRDF/Metro Smart DR in a VMware environment with VMFS
For more detail on SRDF/Metro configurations, see the white paper Best Practices for Using Dell SRDF/Metro in a VMware vSphere Metro Storage Cluster.
The SRDF/Star disaster recovery solution provides advanced multi-site business continuity protection for enterprise environments. It combines the power of Symmetrix Remote Data Facility (SRDF) synchronous and asynchronous replication, enabling the most advanced three-site business continuance solution available today.
SRDF/Star enables concurrent SRDF/S and SRDF/A operations from the same source volumes with the ability to incrementally establish an SRDF/A session between the two remote sites in the event of a primary site outage — a capability only available through SRDF/Star software.
This capability takes the promise of concurrent synchronous and asynchronous operations (from the same source device) to its logical conclusion. SRDF/Star allows you to quickly re-establish protection between the two remote sites in the event of a primary site failure, and then just as quickly restore the primary site when conditions permit.
With SRDF/Star, enterprises can quickly resynchronize the SRDF/S and SRDF/A copies by replicating only the differences between the sessions-allowing for much faster resumption of protected services after a source site failure.
Note: SRDF/Metro does not support SRDF/Star configurations, rather it supports SRDF/Metro Smart DR.
Concurrent SRDF allows the same source data to be copied concurrently to PowerMax arrays at two remote locations. As Figure 197 shows, the capability of a concurrent R1 device to have one of its links synchronous and the other asynchronous is also supported as an SRDF/Star topology. Additionally, SRDF/Star allows the reconfiguration between concurrent and cascaded modes dynamically.
Figure 197. Concurrent SRDF
Note: SRDF/Metro supports 3-site Concurrent SRDF.
Cascaded SRDF allows a device to be both a synchronous target (R2) and an asynchronous source (R1) creating an R21 device type. SRDF/Star supports the cascaded topology and allows the dynamic reconfiguration between cascaded and concurrent modes. See Figure 198 for a representation of the configuration.
Figure 198. Cascaded SRDF
[1] See Copying virtual machines with TimeFinder for more detail.
[2] More recent versions of the Oracle database are able to roll-forward from a crash-consistent copy.
[3] The SRDF SRA supports SRDF/Metro Smart DR configurations with VMware SRM beginning with SRDF SRA 10.1.0.