In this section, we offer general recommendations for storage, CPU, memory, and link bandwidth sizing.
- A minimum of 25% to 30% of spare storage capacity is required for a 2-node cluster.
- Note that in a 2-node cluster, the protection method will be RAID 1. If a node fails, the surviving node will continue to operate with a single object’s component.
- When defining CPU and memory capacity, consider the minimum capacity required to satisfy the VM requirements while in a failed state.
- The general recommendation is to size a cluster to operate below 50% of the max CPU required, taking in consideration the projected growth in consumption.
Figure 5. CPU capacity planning
Our measurements indicate that a regular T1 link can satisfy the network bandwidth requirements for the communications between Data Nodes <> vCenter Server and Data Nodes <> Witness Appliances. However, for the purpose of adapting the solution to different service level requirements, it is important to understand in more details the requirements for:
- Normal cluster operations
- Witness contingencies
- Services, such as maintenance, lifecycle management, and troubleshooting
Figure 6. Network bandwidth planning considerations
Normal cluster operations
- Normal cluster operations include the traffic between data nodes, vCenter Server, and the Witness appliance.
- During normal operations, the bulk of the traffic is between data nodes and vCenter Server. This traffic is affected primarily by number of VMs and number of components but, is typically very light load.
- Our measurements of a cluster with 25 VMs and near 1,000 components indicated a bandwidth consumption lower than 0.3 Mbps.
- The Witness appliance does not maintain any data, only metadata component.
- The Witness traffic can be influenced by the I/O workload running in the cluster, but in general, this is very small traffic while the cluster is in a normal state.
- In the event the primary node fails or is partitioned, the following occurs:
- vSAN powers off the VMs in the failed host.
- The secondary node is elected as the HA primary. The Witness host sends updates to the new primary, followed by the acknowledgment from the primary that the ownership is updated.
- 1,138 bytes is required for each component update.
- When the update is completed, quorum is formed between the secondary host and the Witness host, allowing the VMs to have access to their data and be powered on.
- The failover procedure requires enough bandwidth to allow for the ownership of components to change within a short interval of time.
- Our recommendation for a 2-node cluster with up to 25 VMs is that at least 0.8 Mbps be available to ensure a successful failover operation.
Maintenance, lifecycle management and troubleshooting
- The amount of bandwidth reserved for maintenance, lifecycle management and troubleshooting are determined primarily by the desired transfer times for large files.
- The log files that are used in troubleshooting are compressed and typically can be transferred in a reasonable time.
- The composite files that are used for software and firmware upgrades can be up to 4.0 GB and can take a long time to be transferred when using a T1 link. The bandwidth requirements should be evaluated in case you have specific maintenance window requirements.
- As a reference, if using a T1 link, we expect that at least 1 Mb/s of bandwidth is available for the transfer of the composite file. We estimate that this transfer will take about nine hours.