Dell Technologies has validated the integrated system for Azure Stack HCI Stretch Cluster by creating a detailed and prescriptive reference architecture. The Reference Architecture—Dell EMC Integrated System for Microsoft Azure Stack HCI: Stretched Cluster Deployment provides an overview of the Microsoft Azure Stack HCI operating system and guidance about how to deploy stretched clusters in your environment. A robust set of configurations and different models enables you to customize your infrastructure for application performance, capacity, or deployment location requirements.
For a better understanding of stretch clustering, the following figure shows an example of an eight-node stretched cluster spanned across two different sites, Site A and Site B. Each site has its storage pool of disks, on top of which the volumes are created. Volumes are stretched across the two sites to provide better resiliency.
Figure 43. Stretch clustering topology
Each site must have the same number of servers with the same hardware configuration. These servers are the nodes that are a part of the stretch cluster. To implement a robust configuration, you can choose to have a minimum of four nodes (two nodes per site) and a maximum of 16 nodes (eight nodes per site) across both the sites.
S2D provides the software-defined storage layer for Azure Stack HCI. A stretched cluster environment has two storage pools, one per site. The Azure Stack HCI operating system can stretch volumes across sites making it appear as though there is only one volume. The primary volume is accessible from the nodes at that site. The secondary volume at the other site is meant for standby and is brought online when the primary volume goes offline.
The servers residing in different sites replicate volumes either synchronously or asynchronously. With the synchronous approach, writes to persistent storage are replicated to both sites before they are acknowledged. With the asynchronous approach, writes are acknowledged when they are persisted to one site, and then replicated to the other site moments later, based on the chosen RPO. Local volumes that operate in the boundaries of a single site can be created.
A stretched cluster can be set up as either active/active or active/passive. In an active/active setup, both sites actively run the VMs or applications, therefore the replication occurs bidirectionally. In an active/passive setup, the passive site is always dormant unless there is a failure or planned downtime, waiting for an automatic failover from the active site.
In a stretched cluster, designate one site as the preferred site to define a location on which all resources will run. The other site becomes the secondary site. The system uses the preferred site if there is a loss of network connection between the two active sites, the preferred site is the one that remains operational. These designations also ensure that VMs and volumes come up on the preferred site after a cold start or after network connectivity issues are resolved.
Azure Stack HCI enables you to define the affinity and anti-affinity controls for VMs in a cluster. An affinity rule ensures that two VMs reside in the same site to ensure locality. For example, the web tier of an application and its database are in the same zone. An anti-affinity rule ensures that two VMs stay on separate sites, for example when hosting two domain controllers on separate sites.
To orchestrate a quorum in a stretched cluster, a tie-breaker vote, often called a witness, is required. A dynamic quorum ensures that preferred sites survive if network connectivity fails. Using a dynamic quorum, weighting is decreased from the passive (replicated) site first to ensure that the preferred site survives if all other things are equal. In addition, server nodes are pruned from the passive site first during regrouping after events such as an asymmetric network connectivity failure. During a quorum split across two sites, if the cluster witness cannot be contacted, the preferred site is automatically elected. The server nodes in the passive site then drop out of cluster membership. This scenario allows the cluster to survive a simultaneous 50 percent loss of votes. The preferred site can also be configured at the cluster role or group level. In this case, a different preferred site can be configured for each VM group. This enables a site to be active and preferred for specific VMs.