VxRail Cluster Integrity and Resilience
Tue, 01 Nov 2022 13:30:05 -0000
|Read Time: 0 minutes
This is the fourth article in a series introducing VxRail concepts.
Maintaining cluster integrity is an important task to ensure normal business operations. Some readers might not have a great understanding of what cluster integrity means, so let’s quickly define what we’re discussing. Cluster integrity describes a state where a cluster remains free of hardware and software errors. One of the primary challenges to maintaining cluster integrity is handling change through the cluster life cycle. It’s very likely that an administrator will need to make some kind of change—whether to address something minor, like a disk failure, or something more complex, like needing to add new hardware to a cluster. Software updates are a different kind of challenge. Each cluster node has drivers, firmware, an operating system, and the more elaborate VMware and VxRail system software, adding up to a considerable number of files to consider. VxRail helps make patching more holistic and successful.
The hardware life cycle
As a Dell Support specialist, I spoke with customers about many challenges, and one of the biggest was moving through the hardware lifecycle. For customers working in a nonclustered environment, hardware refreshes frequently come along with highly involved migrations. Administrators can create clusters from these nodes to better provision IT resources, but these clusters are made from off-the-shelf hardware that isn’t necessarily intended to work together. VxRail HCI system software simplifies scaling out drive additions.
Part of the VxRail advantage is the ability to provide administrators confidence while it facilitates change. Continuously Validated States help to achieve this by providing administrators with hardware choices they can select, knowing that their cluster’s current state was built to support that new hardware. For example, maybe you have a 3-node cluster and are ready to add nodes and expand it to a 5-node cluster, but a matching NIC isn’t available. Continuously Validated States define other NICs that will work without creating compatibility problems. Automation scripting, which is used at the time of adding the new nodes to the cluster, scans the network, identifies the available hosts, and orchestrates node assignment to the cluster. This allows customers to scale out a cluster quite easily.
Nodes can also be removed from a cluster with similar scripting. This scripting helps clusters make intergenerational migrations when used in conjunction with the node addition automation. Once the new-generation hosts arrive at the data center, they get added to the cluster with one wizard and can be removed with another to complete the life cycle move. However, VxRail also supports heterogeneous clusters. This would allow you to continue using the older cluster nodes as long as they are needed and comply with each of the cluster’s Continuously Validated States as the cluster progresses through the life cycle.
Software patching
Continuing with our travel analogy from a previous blog in this series, if the update process is a vehicle, then each patch is a piece of cargo that an administrator has to worry about. These patches can quickly build up into backlogs for IT staff, even if only some of them need to be applied to clusters. VxRail life cycle management processes improve patch control by consolidating these independent release cycles into singular bundles that have confirmed compatibility between the new software packages. This helps promote cluster integrity by creating order among these patches. The patches are bundled up into VxRail releases that are made available to cluster owners within 30 days of VMware’s releases. IT staff can then use these resources to be more selective in the patching process and can think of these cycles as opportunities to add new features and functionality to clusters, as opposed to a task with no clearly defined benefit or purpose.
Conclusion
Maintaining cluster integrity means maintaining the stable and productive working state that businesses need nodes to be in. An HCI cluster built internally faces additional challenges maintaining this integrity, especially as software and hardware changes are needed in the environment. VxRail Continuously Validated States help to both broaden the viable hardware options and provide software patch control through update bundles that bake in the patches. Orchestration automation serves to control the cluster as patches are applied or when hardware changes, such as adding new nodes to the cluster, are made. The next article in this series will discuss serviceability and include topics like disk replacement, alerts and events, the overall support experience, and more!