Lifecycle Management for vSAN Ready Nodes and VxRail Clusters: Part 2 – Cloud Foundation Use Cases
Wed, 03 Aug 2022 21:32:15 -0000
|Read Time: 0 minutes
In my previous post I explored the customer experience between using vSphere Lifecycle Manager Images (vLCM Images) and VxRail Manager to maintain HCI stack integrity and completing full stack software updates for standard vSphere cluster use cases. It was clear to see that VxRail Manager optimized operational efficiencies while taking ownership of software validation of the complete cluster to remove the burden of testing and reducing the overall risk during a lifecycle management event. However, common questions I frequently get are: Do those same values carry over when using VxRail as part of a VMware Cloud Foundation on VxRail (Dell Technologies Cloud Platform) deployment? Is vLCM Images even used in VCF on VxRail deployments? In this post I want to dive into answering these questions.
There are some excellent resources available on the VxRail InfoHub web portal. Along with several blog posts that discuss the unique integration between SDDC Manager and VxRail Manager in the area of LCM (like this one), I suggest that you check them out if you are unfamiliar with VCF and SDDC Manager functionality as it will help in following along in this post.
Before we dive in, there are a few items that you should be aware of regarding SDDC Manager and vLCM. I won’t go into all of them here, but you can check out the VCF 4.1 Release Notes, vLCM Requirements and Limitations, VCF 4.1 Admin Guide, and Tanzu documentation for more details. A few worth highlighting include:
- You cannot deploy a Service VM to an NSX Manager that is associated with a workload domain that is using vSphere Lifecycle Manager Images
- Management domains, and thus VCF consolidated architecture deployments, can only support vSphere Lifecycle Manager Baselines (formerly known as VUM) based updates because vLCM Images use is not supported in the Management domain default cluster.
- VMware vSphere Replication is considered a non-integrated solution and cannot be used in conjunction with vLCM Images.
- While vLCM Images supports both NSX-T and vSphere with Kubernetes, it does not support both enabled at the same time within a cluster. This means you cannot use vLCM Images with vSphere with Kubernetes within a VCF workload domain deployment.
As in my last post, the main area of focus here is around the customer experience with VMware Cloud Foundation and VRSLCM and VxRail, specifically:
- Defining the initial baseline node image configuration
- Planning for a cluster update
- Executing the cluster update
- Sustaining cluster integrity over the long term
Oh, and one last important point to make before we get into the details. As of this writing, vLCM is only used when deploying VCF on server/vSAN Ready Nodes and is not used when deploying VCF on VxRail. As a result, all information covered here when comparing vLCM with VxRail Manager essentially compares the LCM experience of running VCF on servers/vSAN Ready Nodes vs VCF on VxRail.
Defining the Initial Baseline Node Image Configuration
How is it Done With vLCM Images?
We have covered this in detail in my previous post. The requirements for VCF-based systems also remains the same but one thing to highlight in VCF use cases is that the customer is still responsible for the installation, configuration, and ongoing updating of the hardware vendor HSM components used in the vLCM Images-based environment. SDDC Manager does not automatically deploy these components nor lifecycle them for you.
VCF deployments do differ in the area of initial VCF node imaging. In VCF deployments there are two initial node imaging options for customers:
- A manual install of ESXi and associated driver and firmware packages
- A semi-automated process using a service called VMware Imaging Appliance (VIA) that runs as a part of the Cloud Builder appliance.
The VIA service tool uses a PXE Boot environment to image nodes that need to be on the same L2 domain as the appliance and are reachable over an untagged VLAN (VLAN ID 0). ESXi Images and VIBs can be uploaded to the Cloud Builder appliance VIA service. Hostnames and IP address are assigned during the imaging process. Once initial imaging is complete, and Cloud Builder has run though its automated workflow, you are left with a provisioned Management Domain. (One important consideration here regarding initial node baseline images: customers need to ensure that the hardware and software components included in these images are validated against the VCF and ESXi software and hardware BOMs that have been certified for the version of VCF that will be installed in their environment.) This default cluster within the Management Domain cannot use vLCM Images for future cluster LCM updates.
When you are creating a new VI Workload Domain in a VCF on a vSAN Ready Nodes deployment, that is, when you can choose to enable vLCM Images as your method of choice for cluster updates or alternatively, you can also select vLCM Baselines. (Note: When using vLCM Baselines, firmware updates are not maintained as part of cluster lifecycle management). If you opt to use vLCM Images, you cannot revert to using vLCM Baselines for cluster management. So, it is very important to choose wisely and understand what LCM operating model is needed prior to deploying the workload domain. Because this blog post focuses on vLCM Images, let’s review what is involved when you select this option.
To begin, it’s important to know that you cannot create a vLCM Images-based workload domain until you import an image into SDDC Manager. But you cannot import an image into SDDC Manager until you have a vLCM Images enabled cluster.
To get over this chicken and egg scenario, the administrator needs to create an empty cluster within the Management Domain where you can set up the image requirements and assign firmware and driver profiles that you have validated during the planning and preparation phase for the initial cluster build. The following figure provides an example of creating the temporary cluster needed to configure vLCM Images.
Figure 1 Creating a temporary cluster to enable vLCM Images as part of the initial setup.
When defining vLCM images, similar to when defining the initial baseline node images, customers are responsible for ensuring that these images are validated against the VCF software BOM that has been certified for the version of VCF that is installed in the environment.
When you are satisfied with the image configuration and you have defined the Driver, Firmware and Cluster Profiles, export the required JSON, ESX ISO, and ZIP files from vSphere UI to your local file system, as shown in the following figure. These files include:
- SOFTWARE_SPEC_1386209123.json
- ISO_IMAGE_1904428419.iso
- OFFLINE_BUNDLE_1829789659.zip
Figure 2 Exporting Images
Next, within the vCenter UI, go to the Development Center menu and choose the API Explorer Tab. At this stage you need to run several API commands.
To do this, first select your endpoint (vCenter Server) from the drop-down option, then select the vCenter Related APIs. When completed, you will be presented with all the applicable vCenter APIs for your chosen end point. Expand the Cluster section and execute the GET API command below for /rest/vCenter/Cluster as shown in the following figure.
Figure 3 In Developer Center: List all Clusters
This displays all the clusters managed by that vCenter and provides a variable for each cluster. Click on the vcenter.cluster.summary (Dell-VSRN-Temp) and copy this value (that is, Domain-c2022 in my example) that you will use in the next step.
Change the focus on the API explorer to ESX and execute a GET API command for /api/esx/settings/clusters/Domain-c2022/software.
Fill in the cluster id parameter (Domain-c2022) as the required value to run the API command (see the following figure). Once executed, click on the download json option and an additional json file downloads to your local file system.
Figure 4 Execute the Cluster software API Command
At this point in time, you have four files
- SOFTWARE_SPEC_1386209123.json
- ISO_IMAGE_1904428419.iso
- OFFLINE_BUNDLE_1829789659.zip
- Reponse-body.json
Finally, within SDDC Manager, select Repository then Image Management and Import Cluster Image. Here you need to import the four files mentioned above. As you import the individual files, make sure that you specify a name for the cluster image and import them in the correct order. Once the import is successful, you can now start to deploy your first vLCM Images enabled workload domain.
How is it Done Using VxRail LCM?
VxRail key integrations with Cloud Foundation start even before any VCF on VxRail components are installed at Dell facilities, as part of the Dell manufacturing process. Here, the nodes are loaded with a VxRail Continuously Validated State image that includes all pre-validated vSphere, vSAN, and hardware firmware components. This means that once VxRail nodes are racked, stacked, and powered on within your datacenter, they are ready to be used to install a new VCF instance, create new workload domains, expand existing workload domains with new clusters, or a expand clusters on an existing system.
For new VCF deployments, Cloud Builder has unique integrated workflows that tailor a streamlined deployment process with VxRail, leveraging existing capabilities for VxRail cluster management operations. Once SDDC Manager is deployed using the Cloud Builder connectivity, two update bundle repositories can then be configured.
Figure 5 SDDC Manager Repository Settings
The first is to the VMware repository which is used for the VMware software such as vSphere, NSX, and SDDC Manager. The second is for the Dell EMC repository for the VxRail software. Once you configure and authenticate with the appropriate user account credentials in SDDC Manager, it will automatically connect to the VxRail repository at Dell EMC and pull down the next available VxRail update package. Each available VxRail update package will have already been validated, tested, and certified with the version of VCF running in the customer’s environment.
Figure 6 VxRail Software Bundle in SDDC Manager
The following figure summarizes the steps needed for defining initial baseline node images for VCF using vLCM Images and VCF using VxRail Manager.
Figure 7 Initial baseline node images and configuration
Planning for a Cluster Update
How is it Done Using vLCM Images?
Although we have reviewed this in detail before, it is worth mentioning here again. Ownership of this process lies on the shoulders of the administrator. In this model, customers would take on the responsibility validating and testing the software and driver combination of their desired state image to ensure full stack interoperability and integrity, and ensuring that the component versions fall within the supported VCF software BOM being used in their environment.
How is it Done Using VxRail LCM?
The VxRail approach is much different. The VxRail engineering teams spend 1000s of test hours across multiple platforms to validate each release. The end user is given a single image to leverage knowing that Dell Technologies has completed the very heavy lift for platform validation. As I mentioned above, SDDC Manager will download the correct bundle based on your VCF Release and mark it available within your SDDC Manager. When a customer sees a new image available, they are guaranteed that it is already compatible with their VCF deployment. This curated bundle management and validation is part of the turnkey experience customers gain with VCF on VxRail.
The following figure illustrates the differences in planning a cluster update for VCF with vLCM Images and VCF with VxRail.
Figure 8 Planning for a cluster update
Executing the Cluster Update
How is it Done Using vLCM Images?
Defining the baseline node image is vital for defining the hardware health of your cluster. Defining a target version for your system’s next update is equally as important. It should involve testing the specific combination of components for the image that is desired. This would be in addition to some of the standard interoperability validation performed by the Ready Node hardware vendor when updates to server hardware firmware and drivers are released. Once the hardware baseline is known, the ESXi image must be imported into vCenter. Drivers, firmware, and Cluster Profiles must then be defined in vCenter so they can be ready to be exported.
We use the same process as originally outlined for the initial setup: Export the images, run the relevant APIs calls, and import the files into SDDC Manager. Every future update will follow the same process as I’ve outlined. Additional firmware and driver profiles will have to be created if new workload domains or clusters are added with different server hardware configuration. Thus, a deployment that caters to multiple hardware use cases will end up with several driver/firmware profiles that will need to be managed and tested independently.
How is it Done Using VxRail LCM?
SDDC Manager is the orchestration engine, defining:
- When each update is applicable
- Ensuring that each update is made available in the correct order, and
- Ensuring that components such as SDDC Manager, vCenter, NSX-T, and VxRail components are updated and coordinated in the correct manner.
For VxRail LCM updates, SDDC Manager will send API calls directly to each VxRail Manager for every cluster being updated to initiate a cluster upgrade. From that point on VxRail Manager will take ownership of the VxRail update execution using the same native VxRail Manager LCM execution process that is used in non-VCF VxRail deployments. During LCM execution, VxRail Manager provides constant feedback to the SDDC Manager throughout the process. VxRail updates these components:
- VMware ESXi
- vSAN
- Hardware firmware
- Hardware drivers
To understand the full range of hardware components that are updated with each release, I urge you to check out the VxRail 7.0 Support Matrix.
The following figure summarizes the steps required to execute cluster updates for VCF with vLCM Images and VCF with VxRail.
Figure 9 Executing a cluster update workflow
Sustaining Cluster Integrity Over the Long Term
How is it Done Using vLCM Images?
Unlike standalone vSphere cluster deployments where vLCM Images manages images on a per cluster basis, VMware Cloud Foundation allows you to manage all cluster images, once imported and repurpose them for other clusters or workload domains. A definite improvement, but each new update requires you create the image, firmware, and driver combinations in vCenter first and then import into SDDC Manager. Of course, this is after you have repeated the planning phases and have completed all the driver and firmware interoperability testing.
Also, it is important to note that if your cluster is being managed by vLCM Images, and you need to expand your clusters with hardware that is not identical to the original hosts (this can happen in situations in which hardware components go end of sale or you have different hardware or firmware requirements for different nodes), you can no longer leverage vLCM Images or change back to using vLCM Baselines. So proper planning is very important.
How is it Done Using VxRail LCM?
VxRail LCM supports customers’ ability to grow their clusters with heterogenous nodes over time. Different generations of servers or servers with differing hardware characteristics can be mixed within a cluster, in accordance with application profile requirements. A single pre-validated image will be made available that will cover all hardware profiles. All of this is factored into each VxRail Continuously Validated State update bundle that is applied to each individual cluster based on its current component's version state.
Conclusion
When we piece together the bigger picture with all the LCM stages combined, it provides an excellent representation of the ease of management when VxRail is at the heart of your VCF deployment.
Figure 10 Comparing vLCM Images and VxRail LCM cluster update operations
It’s clear to see that VxRail, with its pre-validated engineered approach, can provide a differentiated customer experience when it comes to operational efficiency, during both the initial deployment phase and the continuous lifecycle management of the HCI.
While vLCM Images provides a significant improvement from manually applying the updates, the planning and testing required can become quite iterative. And when newer hardware profiles are introduced over the lifespan of the system, things could become more difficult to manage, introducing additional complexity.
By contrast, VxRail provides a single update file for each release that is curated and made accessible within SDDC Manager natively, with no additional administration effort required. It’s simplicity at its finest, and simplicity is at the core of the VxRail turnkey customer experience.
Cliff Cahill
Dell EMC VxRail Engineering Technologist
Twitter: @cliffcahill
LinkedIn: http://linkedin.com/in/cliffcahill
Additional Resources
VCF on VxRail Interactive Demo
VxRail page on DellTechnologies.com