To investigate further how a business continuity solution can be provided, this paper examines the best-practice recovery configurations possible with PowerOne, and how they can be extended to achieve the required outcomes.
We will need to address questions about how we deal with physical connectivity at the storage and network layers, such as how we configure the logical behavior of our vSphere environments to minimize RPO. In order to use the traditional configuration techniques for non-autonomous extended use cases such as SRDF, the PowerOne Controller feature is invoked to reserve and allow seamless hand-off of components. We rely on well-proven tools such as VMware Site Recovery Manager to automate failover and failback operations, and to configure the production network on the vSphere remote site.
Quite often in SRDF/A scenarios, replication is performed through Ethernet channels, whereas in Metro and Synchronous plays, dedicated Fibre ports handle replication traffic. For PowerOne, we recommend having a separate fabric for replication. (It is not a best practice to create a different virtual SAN inside our MDS pair because replication ports use different optics and a different configuration (buffer depth).)
In this scenario, to configure the PowerOne network, we will
- Place the servers in manual mode.
- Reserve the port groups to be used.
- Use the Dell EMC Open Manage Enterprise (OME) tool to place the needed VLANs in a server template.
- Use Dell EMC Open Manage Network Integration (OMNI) to deploy those VLANs across the PowerOne Fabric network.
The following diagram illustrates the steps for configuring and connecting PowerOne at site one. The same steps are repeated to configure and connect site two.
Figure 6. Configuring and connecting the two sites
Figure 7. Reserving PortGroups in PowerOne for bare metal consumption
The newly created VLANs (vlan334) is depicted in the following figure:
Figure 8. Newly created vlan334
PowerOne with SRDF/Metro provides a Business Continuity solution in which we can define RPO and RTO as zero or near-zero when VMware vSphere Metro Storage Cluster technology and architecture are also implemented.
The following table documents a number of considerations to take into account when designing a PowerOne with SRDF/Metro architecture.
Table 1. Considerations for a PowerOne with SRDF/Metro architecture
Compute, storage, and networks configuration
At each PowerOne site:
- Allocate Servers and Storage Ports with the PowerOne Controller API or Navigator
- Select the servers to form the required vSphere Metro Storage Cluster(s) and mark them as being in Manual Mode, to ensure they are not considered for other PowerOne automated CRG outcomes
- Select the storage ports in the PowerMax to be used for server access to storage and mark them in Manual Mode (Bare Metal), to ensure they are also not considered for PowerOne automated CRG outcomes
- Configure vSphere Management IP Networking
- Configure the L3 and L2 subnets as determined by the chosen vSphere Management strategy
- Update server templates for chosen MX Compute Sleds to present required VLANs
- Configure Servers
- Install vSphere, discover storage Volumes and on the first server only create the VMFS volumes
- All other servers will discover the VMFS with the Volumes
- Configure vCenter Management
- If using vCenter HA a 3rd site vCenter witness can be provided by a 3rd PowerOne System, a PowerMax array or a virtual machine, and can share the same site as the SRDF/Metro witness
- Configure Workload IP Networking
- Within a vSphere Metro Storage Cluster, Workload VMs must retain their IP addressing irrespective of which physical site hosts the VM
- Dell Smart Fabric Services provides for a Spine and Leaf architecture that allows L2 subnets to span multiple sites, through a BGP eVPN, or 802.1Q tagging
- VMware NSX also provides for an L2 subnet to span multiple sites managed within the vSphere environment
Follow VMware Best Practices to configure vCenter for traditionally configured clusters.
vCenter can operate in a vSphere Metro Storage Cluster environment in vCenter HA mode.
- Requires a 3rd site vCenter witness to determine site isolation outcomes
- A primary vCenter will be deployed on a host in one half of a vMSC and secondary vCenter on a host in the alternate workload site
- vCenter HA configurations can operate with no requirement for a common L2 network across both sites
- vCenter can be restarted by vSphere HA in the event of a site failure
- The vCenter HA architecture does not require preservation of the management IP addresses through a common L2 subnet, cross site
- Decisions around vCenter embedded or external PSCs may require an external load balancer if the PSCs are to be shared
Given the only potential requirement for a physical Layer 2 stretched network is for recovery of NSX-T Management, another key decision is whether to build a dedicated management cluster and such that the Layer 2 stretch complexity is isolated to that one cluster, or share management and workloads in a cluster (self-managed)
NSX-T Management Design
NSX-T Management VMs can be automatically restarted across a vSphere Metro Storage Cluster
- Requires a common L2 subnet provided outside NSX, for example by Dell Smart Fabric Services extended Spine and Leaf architecture
- NSX-T Management VMs can be configured via DHCP instead of hardcoded IP address to avoid an L2 subnet dependency
- Using DNS to provide a common name and a low TTL for the management VMs allows for VMs to be recovered by the vMSC using regular vSphere HA
- When recovered they require a new IP address matching the L3 subnet they are recovered on and DNS manually updated
Using DNS recovery of NSX-T Management, coupled with vCenter HA, allows for a complete L3 only vSphere management plane across sites
- SRDF/Metro can operate across two workload data centers or with a 3rd site that can arbitrate and act as witness in automated site failure analysis.
- Where there is no 3rd site in the design, PowerMax SRDF/Metro operates in bias mode, where it will prefer one site over another when connectivity is lost
- Using a 3rd site as a witness will provide the highest accuracy in determining failure conditions and minimize false positive failover invocations
- The 3rd site SRDF/Metro witness can be provided by a 3rd PowerOne System, a PowerMax array, or a virtual machine.
- Dell Technologies recommends the use of a 3rd site witness where possible for the most complete automated operational experience
- All vSphere & NSX Management for vSphere Metro Clusters can be deployed in a 3rd site
- This has the advantage of being able to manage the virtual workload and networking without a complicated recovery process
- If the 3rd site suffers from a network isolation event, workloads will continue to run in the vMSC environments with vSphere HA locally managing failover
- If such a compound failure occurs the inaccessibility of the NSX Manager VMs will not prevent VM recovery, as a broadcast mechanism will be activated
- Compounding multiple failure scenarios and investing to solve for those low likelihood events should be considered carefully from a cost/risk perspective
Data Migration and Snapshots
- SRDF can be used to migrate data into storage volumes configured on PowerOne that support a cluster - while RDF is writing to the volumes, they will be not ready, and therefore not accessible to vSphere – this is the expected behavior.
- PowerMax snapshots - can be configured as usual in Unisphere, making them mountable to a cluster would require the traditional storage and vSphere knowledge
Site design and Intersite replication
- Architectural considerations include directly connecting PowerMax ports to a dedicated replication network versus providing connectivity via the Connectrix SAN switches used to connect MX7000 Compute blades to PowerMax
- Direct connection to a dedicated replication network avoids a complex host connectivity SAN design
- Sharing the host connectivity SAN could provide for cross-site host to storage connectivity
Host access to storage can be configured to allow access to PowerMax storage on both sites, known as Uniform host access versus designing for access the local site PowerMax only, known as Non-Uniform host access
Uniform host access allows for failure of a PowerMax array failure without promoting that failure to a site failure condition, however this requires a more complex cross-site host to storage SAN design that can lead to unpredictable application IO latency
- Dell Technologies recommends non-uniform host access in order to avoid configuration and operational complexity due to the design of PowerMax that makes it one of the most highly available components in a data center, and the least likely to fail.