Design recommendations for VPLEX and metro node
The following recommendations apply to VPLEX and metro-node-based Integrated Data Protection solutions providing business continuity and DRR, including:
Adhere to the following best practices when configuring the management network for VPLEX on converged systems:
- The VPLEX Management Server in each converged system must connect to the converged system management network (101 or 1801 or OOB).
- VPLEX Metro requires a routable connection between the VPLEX management servers for each cluster, and between each management server and the Cluster Witness server. If a firewall exists between any of these servers, it must allow ICMP and IPsec traffic in both directions. For more information, see the VPLEX Security Configuration Guide on the Dell Technologies Support website.
- Metro node in a Metro configuration requires a routable connection between the director’s management IP addresses of each cluster in the Metro system. If a firewall exists between any of these servers, see the information in the Metro Node Security Configuration Guide on the Dell Technologies Support website.
Ideally, this routable connection does not use the L2 DCI link. You do not have to extend the management VLAN.
- Link latency between the two VPLEX management servers and the VPLEX Cluster Witness server must not exceed 1 second.
- The IP management network must not be able to route packets to the reserved IPv4 VPLEX subnets 22.214.171.124/24 and 126.96.36.199/24.
- The IP management network must not be able to route packets to the following reserved IPv4 metro node subnets: 188.8.131.52/24, 184.108.40.206/24, 220.127.116.11/24, and 18.104.22.168/24.
- The management servers (MMCSs) for the VPLEX VS6 require two management network connections and two IP addresses per VPLEX cluster.
Even though the VPLEX VS6 uses two MMCSs, the second MMCS does not currently provide any function (such as HA or failover). However, it must be configured with an IP address and operational at all times. Failure of MMCS-B will create alerts and become a Field Replaceable Unit (FRU) event.
Workload mobility and extending VLANs
When you configure workload mobility solutions, adhere to the following best practices:
- Ensure that there is Layer-2 (L2) adjacency between the two converged system production networks. This is to ensure connectivity following a live vSphere vMotion migration or a VMware HA-triggered VM restart event.
- At a minimum, extend the ESXi Management (205 or 1611) VLAN that is used to host the VMware vSphere management VMs between sites. This necessitates trunking a subset of VLANs that are generally considered to be internal outside the converged system.
- The customer is responsible for the L2 extension between sites. The technology that they choose for this (OTV or back-to-back vPC) must comply with latency requirements.
- Effective from VMware vSphere 6.0, VMware introduced Layer 3 vMotion capabilities, which means you do not have to extend the Layer 2 VLAN 206 that is used for VMware vSphere vMotion in the compute hosts.
- For full resilience, provide gateway redundancy between sites, ideally implementing FHRP isolation to avoid hosts and virtual machines unnecessarily crossing the DCI link to reach their gateway.
VPLEX Metro detach rules
When configuring VPLEX Metro, bear in mind the following considerations:
- A detach rule is defined for each VPLEX Metro distributed vVol.
- When a communication failure occurs between the two clusters in a VPLEX Metro configuration, the detach rule identifies which VPLEX cluster must detach its mirror leg, allowing services to continue.
- The purpose of a defined preferred site is to ensure that there is no possibility of a split-brain scenario with both VPLEX clusters continuing to allow I/O during a communication failure.
- After a complete communication failure between the two VPLEX clusters, the preferred site continues to provide service to the distributed vVol.
- The other VPLEX cluster suspends I/O service to the volumes. That cluster is referred to as the nonpreferred site.
- The detach rule is at the distributed vVol level so that either converged system in a VPLEX Metro configuration can be the preferred site for some distributed vVols and the nonpreferred site for others.
- A VPLEX Metro instance can support up to 5,000 distributed vVols, and each volume has its own detach rule.
- VPLEX Witness failure-handling semantics apply only to the distributed vVols in a CG. CGs have a bias rule set similar to a detach rule to determine the preferred site.
- All distributed vVols common to the same set of VMs must be in one CG. All VMs that are associated with that CG must be located at the preferred site.
Failure conditions that invoke VPLEX detach rules
The following failure conditions invoke VPLEX detach rules:
- A total VPLEX cluster failure at one site (all directors in a cluster)
- A VPLEX WAN partition
Total VPLEX cluster failure at one site
The following conditions occur during the failure and after the failure is resolved:
- A complete VPLEX cluster failure triggers detach rule behavior because the surviving VPLEX cluster cannot determine between intersite communication loss and complete VPLEX cluster failure.
- The distributed vVols whose preferred site is the surviving VPLEX cluster continue to run without interruption.
- The distributed vVols whose preferred site is the site with the failed VPLEX cluster site enter into I/O suspension.
- After the VPLEX cluster failure is resolved, the distributed vVols are reestablished, enabling I/O on both VPLEX clusters in a Metro configuration.
VPLEX WAN partition
The following conditions occur during the failure and after the failure is resolved:
- The VPLEX cluster WAN partition (intersite communication failure) also triggers execution of the detach rule.
- Each distributed vVol allows I/O to continue at the preferred site and suspends I/O at the nonpreferred site.
- After the VPLEX cluster WAN partition condition is resolved, the distributed vVols are re-established, enabling I/O on both VPLEX clusters in a Metro configuration.
VPLEX Metro persistent device loss
VMware vSphere does not recognize all types of total path failure when running in VPLEX environments. To remedy this issue, configure VPLEX Metro for persistent device loss under the advanced VMware ESXi settings, following these guidelines:
- VMware vSphere recognizes two types of total path failure to a VMware ESXi server. Either condition can be declared by the VMware ESXi server following a failure condition.
- All paths down (APD).
- Persistent device loss (PDL): PDL is a state declared by a VMware ESXi server when a SCSI sense code is sent from the underlying storage array (in this case, VPLEX) to the VMware ESXi host, informing the host that the paths can no longer be used. This condition can occur if VPLEX suffers a WAN partition causing storage volumes at the non-preferred location to suspend. VPLEX sends the PDL SCSI sense code to the VMware ESXi server from the site that is suspending (the non-preferred site).
VMware HA does not automatically recognize that a SCSI PDL state causes a VM to invoke an HA failover, which causes an outage. This is not acceptable when VMware HA is used with VPLEX in a stretched cluster configuration.
- VMware vSphere can act on the SCSI PDL state by powering off the VM, invoking HA failover. This behavior requires additional settings in the VMware vSphere cluster. For more information, see “Advanced VMware parameters to regulate PDL conditions” in the Tier-3 Platform Logical Build Guide. To obtain a copy of the guide, contact your Dell Technologies sales representative.
VMware vCenter Server placement
When implementing HA for VMware vCenter Server, adhere to the following best practices:
- In a stretched cluster solution, VMware vCenter Server requires site mobility capabilities to enable automatic failover to either converged system if there is an outage.
- Without VPLEX Metro and a VMware stretched clustering solution, each converged system would use its own instance of vCenter Server in the 3-Tier Management solution to manage its ESXi hosts. To deploy an active/active stretched clustering solution with VPLEX Metro, use a single VMware vCenter Server instance to manage all the hosts in the two converged systems that will participate in VMware stretched clusters.
The following options are supported on converged systems:
- If all ESXi hosts from Site-A and Site-B participate in a VMware stretched cluster, one vCenter server hosts all 3-Tier Management solution VMs from Site-A and Site-B, and all VMs on VPLEX distributed volumes that are attached to VMware ESXi hosts in Site-A and Site-B.
- For any VMware ESXi hosts in Site-A or Site-B or both that do not participate in a VMware stretched cluster, use the vCenter Server in each converged system to manage the 3-Tier Management solution and the local VMware ESXi hosts. Deploy a third vCenter server to manage the participating VMware ESXi hosts.
- Converged system management VMs include the VMware vCenter Server Appliance and the consolidated PowerPath vApp. All other management functions that reside in the 3-Tier Management solution remain local to their converged system.
- Ensure that you apply the additional VMware parameters to Cisco UCS C-Series servers when building a VMware stretched cluster. For more information, see “Advanced VMware parameters to regulate PDL conditions” in the
Tier 3 Platform Logical Build Guide. To obtain a copy of the guide, contact your Dell Technologies sales representative.