The concept of a stretched cluster is a good example of vSAN’s native integration with vSphere. With VxRail, stretch clustering extends availability of large enterprise datacenter. The stretched cluster is a specific configuration implemented in environments where the requirement for datacenter-level downtime avoidance is absolute. Similar to how fault domains enable “rack awareness” for rack failures; stretched clusters provide “datacenter awareness,” maintaining virtual machine availability despite specific datacenter failure scenarios.
In a VxRail environment, stretched clusters with a witness host refers to a deployment where a vSAN cluster consists of two active/active sites with an identical number of ESXi hosts distributed evenly between them. The sites are connected via a high bandwidth/low latency networking.
In the figure below, each site is configured as a vSAN fault domain. The nomenclature used to describe the stretched cluster configuration is X+Y+Z, where X is the number of ESXi hosts at Site A, Y is the number of ESXi hosts at Site B, and Z is the number of witness hosts at site C.
Figure 53. Stretched VxRail cluster
A virtual machine deployed on a stretched cluster has one copy of its data on Site A, and another on Site B, as well as witness components placed on the host at Site C.
It is a singular configuration, achieved through a combination of fault domains, hosts and VM groups, and affinity rules. In the event of a complete site failure, the other site still has a full copy of virtual machine data and at least half of the resource components are available. That means all the VMs remain active and available on the vSAN datastore. The recovery point objective (RPO) is zero and the data recovery time objective (RTO) is zero. The application RTO is dependent upon the application recoverability.
The minimum configuration supported by VxRail is 3+3+1 (7 nodes); the maximum is 15+15+1 (31 nodes). Stretched clusters are supported by both hybrid and all-flash VxRail configurations. Stretched clusters running in the latest VxRail software versions support customer-driven upgrades. For clusters running older VxRail software, customers will need to contact support to facilitate upgrades.
For more information, refer to VxRail vSAN Stretched Clusters Planning Guide: https://vxrail.is/stretchedclusterplanning
VxRail software version 4.5 and vSAN 6.6 or above support Stretched Clusters with Local Protection. This feature mirrors data between sites, with each site applying local data protection for increased protection. The protection is specified using two parameters: Primary Failures To Tolerate (PFTT), and Secondary Failures To Tolerate (SFTT). PFTT refers to the protection between sites which is always RAID1 mirroring. SFTT is the local protection applied at each site. Hybrid configurations support SFTT of 0, 1, 2, or 3 with RAID1 (mirroring) Failure Tolerance Method (FTM). All-flash configurations support SFTT of 0, 1, 2, or 3 with RAID1 FTM or SFTT of 1 or 2 with Erasure Coding FTM.
Local protection for an all-flash stretched cluster configuration is shown in the figure below.
Figure 54. Stretched cluster with local protection
In a conventional storage-cluster configuration, reads are distributed across replicas. In a stretched cluster configuration, the vSAN Distributed Object Manager (DOM) also takes into account the object’s fault domain, and only reads from replicas in the same domain. That way, it avoids any lag time associated with using the inter-site network to perform reads.
Both Layer-2 (same subnet) and Layer-3 (routed) configurations are used for stretched cluster deployments. A Layer-2 or Layer-3 connection is configured between data sites, and Layer-3 connection between the witness and the data sites.
The bandwidth between data sites depends on workloads, but Dell EMC requires a minimum of 10Gbps for VxRail systems in a stretched cluster configuration. The supported latency for witness hosts is up to 200ms RTT and a bandwidth of 2Mbps for every 1,000 vSAN objects. Also bear in mind that the latency between data sites should be no be greater than 5ms, generally estimated at 500km or about 310 miles.
Stretched cluster configurations effectively have three fault domains. The first functions as the preferred data site, the second is the secondary data site, and the third is simply the witness host site.
The vSAN master node is placed on the preferred site and the vSAN backup node is placed on the secondary site. As long as nodes (ESXi hosts) are available in the preferred site, then a master is always selected from one of the nodes on this site—similarly for the secondary site, as long as nodes are available on the secondary site.
The master node and the backup node send heartbeats every second. If heartbeat communication is lost for five consecutive heartbeats (five seconds), the witness is deemed to have failed. If the witness has suffered a permanent failure, a new witness host can be configured and added to the cluster. Preferred sites gain ownership in case of a partition.
After a complete failure, both the master and the backup end up at the sole remaining live site. Once the failed site returns, it continues with its designated role as preferred or secondary, and the master and secondary migrate to their respective locations.
In terms of the communication with the witness, if the heartbeat pauses for five consecutive beats, the master assumes that the witness failed. If it’s a permanent failure, a new witness host needs to be configured and added to the cluster, and preferred sites gain ownership in case of a partition.
A stretched cluster requires the following vSphere HA settings:
Host monitoring is enabled by default in all VxRail deployments, including of course stretched cluster configurations. This feature also uses network heartbeat to determine the status of hosts participating in the cluster. It indicates a possible need for remediation, such as restarting virtual machines on other cluster nodes.
Configuring admission control ensures that vSphere HA has sufficient available resources to restart virtual machines after a failure. This may be even more significant in a stretched cluster than it is in a single-site cluster, because it makes the entire, multi-site infrastructure resilient. Workload availability is perhaps the primary motivation behind most stretched cluster implementations.
The deployment needs sufficient capacity to accommodate a full site failure. Since the stretched cluster equally divides the number of ESXi hosts between sites, Dell EMC recommends configuring the admission-control policy to 50 percent for both CPU and memory to ensure that all workloads can be restarted by vSphere HA.