Dell Solutions for Azure Stack HCI stretched clusters offer distinct network topologies that are validated with the following stretched cluster configurations:
- Basic configuration
- High throughput configuration
Basic configuration sees a network topology that requires minimal changes to a traditional single-site Azure Stack HCI configuration. This configuration uses a single network/fabric for management, VM, and replication traffic, keeping host networking simple. The customer network team must configure quality of service (QoS) on an external firewall or routers to throttle inter-site bandwidth and thereby ensure that Replica/VM traffic does not saturate the Management network.
High throughput configuration suits customer environments that are dense and involves higher write IOPs compared to a basic configuration. This configuration requires a dedicated channel (network interface cards (NICs) or fabric) for Replica traffic (using SMB-Multichannel). This network topology should be used only if inter-site bandwidth is higher than 10 Gbps. The network team must configure multiple static routes on the host to ensure that Replica traffic uses the dedicated channel that has been created for it. If the customer environment does not use Border Gateway Protocol (BGP) at the ToR layer, static routes are needed on the L2/L3 to ensure that the Replica networks reach the intended destination. Subsequent sections of this guide provide more information about the expectations of customer networking teams.
A stretched cluster environment has two storage pools, one per site. In both topologies described in the preceding section, storage traffic requires Remote Direct Memory Access (RDMA) to transfer data between nodes within the same site. Because Storage and Replica traffic produces heavy throughput on an all-flash or NVMe configuration, we recommend that you keep the Storage traffic on separate redundant physical NICs.
This table shows the types of traffic, the protocol used, and the recommended bandwidth:
Types of traffic | Protocol used | Recommended bandwidth |
Management | TCP | 1/10/25 Gb |
Replica | TCP | 1/10/25 Gb |
Intra-site storage | RDMA | 10/25 Gb |
Compute Network | TCP | 10/25 Gb |
Here are some hovers over consider about network configuration:
- Management traffic uses Transmission Control Protocol (TCP). Because management traffic uses minimal bandwidth, it can be combined with Storage Replica traffic or even use the LOM, OCP, or rNDC ports.
- VM Compute traffic can be combined with management traffic.
- Inter-site Live Migration traffic uses the same network as Storage Replica.
- Storage Replica uses TCP as RDMA is not supported for replica traffic over L3 or WAN links. Depending on the bandwidth and latency between sites and the throughput requirements of the cluster, consider using separate redundant physical NICs for Storage Replica traffic.