This section provides requirements and general recommendations for storage, CPU, memory, and link bandwidth sizing.
- A minimum of 25 percent to 30 percent of spare storage capacity is required for a 2-node cluster.
- A 2-node cluster has RAID 1 protection. If a node fails, the surviving node will continue to operate with a single object’s component.
- When defining CPU and memory capacity, consider the minimum capacity required to satisfy the VM requirements when a failed state exists.
- In general, size a cluster to operate below 50 percent of the maximum CPU required, considering the projected growth in consumption.
Our measurements indicate that a regular T1 link can satisfy the network bandwidth requirements for the communications between data nodes <> vCenter Server and data nodes <> witness appliances. However, to adapt the solution to different service-level requirements, you must understand the requirements for:
- Normal cluster operations
- Witness contingencies
- Services, such as maintenance, life cycle management, and troubleshooting
Figure 6. Network bandwidth planning considerations
Normal cluster operations
- Normal cluster operations include the traffic between data nodes, vCenter Server, and the witness appliance.
- During normal operations, the bulk of the traffic is between data nodes and vCenter Server. This traffic is affected primarily by the number of VMs and the number of components but is typically a light load.
- Our measurements of a cluster with 25 VMs and nearly 1,000 components indicated a bandwidth consumption lower than 0.3 Mbps. See Figure 6.
- The witness appliance does not maintain any data, only metadata components.
- The witness traffic can be influenced by the I/O workload running in the cluster, but, in general, there is light traffic while the cluster is in a normal state.
- If the primary node fails or is partitioned, the following events occur:
- vSAN powers off the VMs in the failed host.
- The secondary node is elected as the HA primary. The witness host sends updates to the new primary, followed by the acknowledgment from the primary that the ownership is updated. Each component update requires 1,138 bytes.
- When the update is completed, quorum is formed between the secondary host and the witness host, allowing the VMs to have access to their data and be powered on.
- The failover procedure requires enough bandwidth to allow for the ownership of components to change within a short interval of time.
- Our recommendation for a 2-node cluster with up to 25 VMs is that at least 0.8 Mbps be available to ensure a successful failover operation. See Figure 7.
Maintenance, life cycle management, and troubleshooting
- The wanted transfer times for large files primarily determines the amount of bandwidth reserved for maintenance, life cycle management, and troubleshooting.
- The log files that are used in troubleshooting are compressed and typically can be transferred in a reasonable time.
- The composite files that are used for software and firmware upgrades can be up to 4.0 GB and can take a long time to be transferred through a T1 link. Evaluate bandwidth requirements if you have specific maintenance window requirements.
- As a reference, if you are using a T1 link, we expect that at least 1 Mb/s of bandwidth is available for the transfer of the composite file. We estimate that this transfer will take about nine hours.