Stretch clustering included with Azure Stack HCI must be configured on validated hardware found in the Azure Stack HCI Catalog. The engineering team at Dell Technologies has deliberately designed and validated the Dell Integrated System for Microsoft Azure Stack HCI to optimize the HCI experience. The Dell Integrated System for Microsoft Azure Stack HCI: Stretched Cluster Deployment Reference Architecture Guide is based on detailed results from extensive testing of stretch clustering in the labs.
Critical design considerations and best practices in this reference architecture include:
- The cluster size must be a minimum of 4 nodes and a maximum of 16 nodes. Both sites must be running the same number of AX nodes in the cluster, and these nodes must have identical hardware configurations.
- Only the AX-640 and AX-740xd Azure Stack HCI platforms running a single storage tier with all-flash or all-NVMe configurations have been validated to run stretch clustering.
- Consider using two-way mirror volumes for best performance. Capacity requirements must be carefully weighed considering the raw storage used and the storage efficiency of mirror volumes in a stretch clustering configuration. A two-way mirror results in 25 percent storage efficiency.
- Latency between sites is a significant consideration in a stretch clustering environment. The networks connecting the two sites must have enough bandwidth to accommodate the data rate of change and to contain write I/O. Although we make no specific recommendations on distance between sites, note the following guidelines:
- For both synchronous and asynchronous replication, aim for less than 200-millisecond roundtrip latency between the AX nodes and the witness. If the witness is a highly available file share, it must be created at a tertiary site and not at either site hosting the stretch cluster nodes. The witness could also be an Azure cloud witness.
- For synchronous replication, aim for an average roundtrip latency of 5 milliseconds or less between the AX nodes in Site 1 and the AX nodes in Site 2.
- For asynchronous replication, which has no specific latency requirements, the most important consideration is correct RPO configuration.
- I/O-intensive workloads are not good candidates for stretch clustering environments. Both synchronous and asynchronous replication involve high I/O to the underlying hard drives during write operations, which can result in a significant performance impact.
- Storage Replica is not a backup and restore solution. Because it is a general-purpose, storage-agnostic replication engine, use Storage Replica along with backup and recovery software capabilities. Back up only the active volumes, not the replica or log volumes.
- Configure preferred sites in a stretched cluster to define a location to run all resources. Such configuration ensures that VMs and volumes become available on the preferred site after a cold start or after network connectivity issues.