Delivering VxRail simplicity with vLCM compatibility
Tue, 28 Sep 2021 17:23:22 -0000|
Read Time: 0 minutes
As the days start off with cooler mornings and later sunrises, we welcome the autumn season. Growing up each season brought forth its own traditions and activities. While venturing through corn mazes was fun, autumn first and foremost meant that it was apple-picking time. Combing through the orchard, you’re constantly looking for which apple to pick, even comparing ones from the same branch because no two are alike. Just like the newly introduced VMware vSphere Lifecycle Manager (vLCM) compatibility in VxRail 7.0.240, there are differences to the VxRail implementation as compared to that of the Dell EMC vSAN Ready Nodes, though they’re from the same vLCM “branch.”
Now that VxRail offers vLCM compatibility, it’s a good opportunity to provide an update to Cliff’s blog post last year where he provided a comprehensive review of the customer experiences with lifecycle management of vSAN Ready Nodes and VxRail clusters. While my previous blog post about the VxRail 7.0.240 release provided a summary of VxRail’s vLCM implementation and the added value, I’ll focus more on customer experience this time. Combining the practice of Continuously Validated States to ensure cluster integrity with a VxRail-driven experience truly showcases how automated the vLCM process can be.
In this blog, I’ll cover the following:
- Overview of VMware vLCM
- Compare how to establish a baseline image
- Compare how to perform a cluster update
Overview of VMware vLCM
Figure 1: VMware vSphere Lifecycle Manager vLCM framework
VMware vLCM was introduced in vSphere 7.0 as a framework to allow for software and hardware to be updated together as a single system. Being able to combine the ESXi image and component firmware and drivers into a single workflow helps streamline the update experience. To do that, server vendors are tasked with developing their own plugin into this vLCM framework to perform the function of the firmware and drivers addon as depicted in the Figure 1. The server vendor implementation provides functionality to build the hardware catalog of firmware and drivers on the server and supply the bits to vCenter. For some components, the server vendors do not supply their firmware and drivers, and relies on individual vendors to provide the addon capability. Put together, the software and hardware form a cluster image. To start using vLCM, you need to build out a cluster image and assign it as the baseline image. For future updates, you have to build out a cluster image and assign it as the desired state image. Drift detection between the two determines what needs to be remediated for the cluster to arrive at the desired state.
For Dell EMC vSAN Ready Nodes, you will use the OMIVV (OpenManage Integration with VMware vCenter) plugin to vCenter to use the vLCM framework. Now VxRail has enhanced VxRail Manager to plug into vCenter in its vLCM implementation. The difference between the two implementations really drives home that vSAN Ready Nodes, whether its Dell EMC’s or other server vendors, deliver a customer-driven experience versus a VxRail-driven experience. Both implementations have their merits because they target different customer problems. The customer-driven experience makes sense for customers who have already invested the IT resources to have more operational control of what is installed on their clusters. For customers looking for operational efficiency that reduces and simplifies their day-to-day responsibility to administrate and secure infrastructure, the VxRail-driven experience provides them with the confidence to be able to so.
Enabling VMware vLCM with the baseline image
A baseline image is a cluster image that you have identified as the version set to deliver that happy state for your cluster. IT operations team is happy because the cluster is running secure and stable code that complies with their company’s security standards. End users of the applications running on the cluster are happy because they are getting the consistent service required to perform their jobs.
For Dell EMC vSAN Ready Nodes or any vSAN Ready Nodes, users first need to arrive at what the baseline image should be before deploying their clusters. That requires research and testing to validate that the set of firmware and drivers are compatible and interoperable with the ESXi image. Importing it into vLCM framework involves a series of steps.
Dell EMC vSAN Ready Node uses the OMIVV plugin to interface with vCenter Server. A user needs to first deploy this OMIVV virtual machine on vCenter.
- Once deployed, the user has to register it with vCenter Server.
- From the vCenter UI, the user must configure the host credentials profile for iDRAC and the host.
- To acquire the bits for the firmware and drivers, user needs to install the Dell Repository Manager which provides the depot to all firmware and drivers. Here is where the user can build the catalog of firmware and drivers component-by-component (BIOS, NICs, storage controllers, IO controllers, and so on) for their cluster.
- With the catalog in place, the user uploads each file into an NFS/CIFS share that the vCenter Server can access.
- From the vCenter UI, user creates a repository profile that points to the share with the firmware and drivers. Next is defining the cluster profile with the ESXi image running on the cluster and the repository profile. This cluster profile becomes the baseline image for future compliance checks and drift remediation scans.
For VxRail, vLCM is not automatically enabled once your cluster is updated to VxRail 7.0.240. It’s a decision you make based on the benefits that vLCM compatibility provides (described in my previous blog post). Once enabled, it cannot be disabled. To enable vLCM, your VxRail cluster needs to be running in a Continuously Validated State. It is a good idea to run the compliance checker first.
Once you have made the decision to move forward, VxRail’s vLCM implementation is astoundingly simple! There’s no need for you to define the baseline image because you’re already running in a Continuously Validated State. The VxRail implementation obfuscates the plugin interaction and uses the vLCM APIs to automate all the previously described manual steps. As a result, enabling vLCM and establishing the baseline image have been reduced to a 3-step process.
- Enter the vCenter user credentials.
- VxRail automatically performs a compliance check to verify the cluster is running in a Continuously Validate State.
- VxRail automatically ports the Continuously Validated State into the formation of the baseline image.
And that’s it! The following video clip captures the compliance check you can run first and then the three step process to enable vLCM:
Figure 3: How to enable vLCM on VxRail
Cluster update with vLCM
For Dell EMC vSAN Ready Nodes, the customer-driven process to build the desired state image is similar to the baseline image. It requires investigation, research, and testing to define the next happy state and the use of the Dell Repository Manager to save and export the hardware catalog to vCenter. From there, users build out a cluster image that includes the ESXi image and the hardware catalog that becomes the desired state image.
Not surprisingly, performing a cluster update with vLCM doesn’t fall too far from the VxRail tree, VxRail streamlines that process down to a few steps within VxRail Manager. By using vLCM APIs, VxRail incorporates the vLCM process into the VxRail Manager experience for a complete LCM experience.
Figure 4: Process to perform cluster update with VxRail
- From the new update advisor tool, select the target VxRail version to which you want to update your cluster. The update advisor then generates a drift remediation report (called an advisory report) that provides a component-by-component analysis of what needs to be updated. This information along with estimated update time will help you plan the length of your maintenance window.
- Running a cluster readiness precheck ahead of your maintenance window is good practice. It allows you time to address any issues that may be found ahead of your scheduled window or to plan for additional time.
- Having passed the precheck, VxRail Manager will incorporate the vLCM process into its own experience. VxRail Manager includes the vendor addon capability in vLCM so that you can add separate firmware and drivers that are not part of the VxRail Continuously Validated State, such as a Fibre-channel HBA. Using the vLCM APIs, VxRail can automatically port the Continuously Validated State LCM bundle and any non-VxRail managed component firmware and drivers into the cluster image for remediation.
- If you want to customize the cluster image even more with NSX-T or Tanzu VIBs, you can add them from vCenter UI. Once included in the desired state image, you have the option of either initiating the remediation from vCenter or from the VxRail Manager UI. For those not adding these VIBs, then the entire cluster update experience stays within the simple and familiar VxRail Manager experience.
Check out the following video clip to see this end-to-end process in action:
Figure 5: How to update your VxRail cluster with VMware vLCM
With both Dell EMC vSAN Ready Nodes and VxRail using the same vLCM framework, it’s a much easier task to deliver an apples-to-apples comparison that clearly shows the simplicity of VxRail LCM with vLCM compatibility. This vLCM implementation is a perfect example how VxRail is built with VMware and made to enhance VMware. We’ve integrated the innovations of vLCM into the simple and streamlined VxRail-driven experience. As VMware looks to deliver more features to vLCM, VxRail is well positioned to present these capabilities in VxRail fashion.
For more information about this topic, check out the latest podcast: https://infohub.delltechnologies.com/p/vxrail-vlcm-compatibility/
Daniel Chiu, Senior Technical Marketing Manager at Dell Technologies
Related Blog Posts
Take Advantage of the Latest Enhancements to VxRail Life Cycle Management
Tue, 20 Jun 2023 16:52:40 -0000|
Read Time: 0 minutes
Providing the best life cycle management experience for HCI is not easy, nor is it a one-time job for which we can pat ourselves on the back and move on to the next endeavor. It’s a continuous cycle that incorporates feature enhancements and improvements based on your feedback. While we know that improving VxRail LCM is vitally important for us to continue to deliver differentiating value to you, it is just as important that your clusters continue to run the latest software to realize the benefits. In this post, I’ll provide a deep dive into the LCM enhancements introduced in the past few software releases so you can consider the added functionality that you can benefit from.
Focus areas for improved LCM
Going back into last year, we prioritized four focus areas to improve your LCM experience. While the value is incremental when you look at just a single software release, this post provides a holistic perspective of how VxRail has improved upon LCM over time to further increase the efficiencies that you enjoy today.
- Based on data that we have gathered on reported cluster update failures, we found that almost half of the update failures occurred because a node failed to enter maintenance mode. Effectively addressing this issue can potentially be the most impactful benefit for our customer base.
- As the VxRail footprint expands beyond the data center, resource constraints such as network bandwidth and Internet connectivity can become significant hurdles for effectively deploying infrastructure solutions at the edge. Recent enhancements in VxRail focused on creating space-efficient LCM bundle transfers.
- Doing more with less is a common thread across all organizations and industries. In the context of VxRail LCM, we’re looking to further simplify your cluster update planning experience by putting more actionable information at your fingertips.
- While no product, including VxRail, can avoid a failure from ever happening, VxRail looks to put you in a better position to protect your cluster and quickly recover from a failure.
Figure 1. 12+ month recap of LCM enhancements
Now that you know about the four focus areas, let’s get into the details about the actual improvements that have been introduced in the last 12+ months.
Mitigating maintenance mode failures
In our investigation, we were able to identify three major issues that caused a cluster update failure because a node did not enter maintenance mode accordingly:
- VMtools was still mounted on a VM.
- VMs were pinned to a host due to an existing policy.
- vSAN resynchronization was taking too long and exceeded the timeout value.
In VxRail 7.0.350, prechecks were added for the first two issues. When a pre-update health check is run, these new VxRail prechecks identify those issues if they exist and alert you in the report so that you can remedy the issue before initiating a cluster update. In the same release, the timeout value to wait for a node to enter maintenance mode was doubled to reduce the chance that vSAN resynchronization does not finish in time.
Next, the cluster update capability set was also enhanced to address a cluster update failure due to a node not entering maintenance mode as expected. With the combination of enhancements made to cluster update error handling and cluster update retry operations in VxRail 7.0.350 and VxRail 7.0.400 respectively, VxRail is now able to handle this scenario much more efficiently. If a node fails to enter maintenance mode, the cluster update operation now skips the node and continues on to the next node instead of failing out of the operation altogether. Upon running the cluster update retry operation, VxRail can automatically detect which node requires an update instead of updating the entire cluster.
Space-efficient LCM bundle transfers
The next area of improvement addressed reducing the package sizes of the LCM bundles. A smaller package size can be very beneficial for bandwidth-constrained environments such as edge locations.
VxRail 7.0.350 introduced the capability for you to designate a local Windows client at your data center to be the central repository and distributor of LCM bundles for remote VxRail clusters that are not connected to the Internet. Using a separate PowerShell commandlet installed on the client, you can initiate space-efficient bundle transfers from the client to your remote clusters in your internal network. The transfer operation automatically scans the manifest of the Continuously Validated State (VxRail software version) running on the VxRail cluster and determines the delta compared to the requested LCM bundle. Instead of transferring the full LCM bundle, which is greater than 10 GB in size, it only packages the necessary installation files. A much smaller LCM bundle can cut down on bandwidth usage and transfer times.
Figure 2. Central repository and distributor of LCM bundles to remote VxRail clusters
In VxRail 7.0.450, space-efficient LCM bundles can also be created when VxRail Manager downloads an LCM bundle from the Dell cloud. This feature requires that the VxRail Manager be connected to the Dell cloud.
Simplified cluster update planning experience
The next set of LCM enhancements is centered around providing you with critical insights to maximize the probability of a successful cluster update and for the information to be up-to-date and readily available whenever you need it.
Since VxRail 7.0.400, the pre-update health check includes a RecoverPoint for VMs compatibility precheck to detect whether its current version of software is compatible with the target VxRail software version.
VxRail 7.0.450 increased the frequency at which the VxRail prechecks file is updated. The increased frequency ensures that any additional prechecks added by engineering because of technology changes or new learnings from support cases are incorporated into the VxRail prechecks file that is run against your cluster. When your cluster is connected to the Dell cloud, VxRail Manager periodically scans for the latest VxRail prechecks file.
VxRail 7.0.450 also automated the health check to run every 24 hours. The combination of automated VxRail prechecks file scans and health check runs ensure that you have access to an up-to-date health check report once you log in to VxRail Manager.
VxRail 7.0.450 also further simplified your cluster update planning experience by consolidating into a single, exportable report all the necessary insights about your cluster to help you decide whether to move forward with a cluster update. This update advisor report has four sections:
- VxRail Update Advisor Report Summary includes the current VxRail version running on the cluster, the target (or selected) VxRail version, estimated duration to complete a cluster update, a link to the release notes, and information about your backup for your service VMs.
Figure 3. Update advisor report—summary report
- VxRail Components shows which components need to be updated to get to the target VxRail version. The table includes the current version and target version for each component.
Figure 4. Update advisor report—components report
- VxRail Precheck is the previously mentioned pre-update health check report, inclusive of all the enhancements discussed.
Figure 5. Update advisor report—LCM precheck report
- VxRail Custom Components is a report that highlights user-managed components installed on the cluster. You should consider these custom components when deciding whether to schedule a cluster update.
Figure 6. Update advisor report—custom components report
When VxRail Manager is connected to the Dell cloud, it automatically scans for new update paths. Once a new update path is detected, VxRail Manager downloads a lightweight manifest file that contains all the information needed to produce the update advisor report. The report is automatically generated every 24 hours. This feature is designed to streamline the availability of up-to-date critical insights to help you make an informed decision about a cluster update.
The last set of LCM enhancements that I will cover is around serviceability. While many of the features discussed earlier are meant to be proactive and to prevent failures, there are times when failures can still occur. Being able to efficiently troubleshoot the issues is critically important to getting your clusters back up and running quickly.
In VxRail 7.0.410, the logging capability was enhanced in a couple of areas so that the Dell Support team can pinpoint issues faster. When a pre-update health check identifies failures, the offending host is now recorded. If a node does fail to enter maintenance mode, the logs now capture the reason for the failure.
In VxRail 7.0.450, we automated the backup of the VxRail Manager VM and vCenter Server VM (if it’s VxRail managed). Now you can easily back up your service VMs before updating a cluster.
Figure 7. Automate VxRail backup of service VMs before a cluster update
This feature is also integrated into the update advisor report, where you can see the latest backup on the report summary and click a link to go to the backup page to create another backup.
Value of VxRail life cycle management
If life cycle management is one of the major reasons that you chose to invest in VxRail, our continuous improvements to life cycle management should be a compelling reason to keep your clusters running the latest software. VxRail life cycle management continues to provide significant value by addressing the challenges that your organization faces today.
Figure 8. VxRail benefits (data from "The Business Value of Dell VxRail HCI," April 2023, IDC)
In an IDC study sponsored by Dell Technologies, The Business Value of Dell VxRail HCI, the value that VxRail LCM provides to organizations is significant and compelling. The results of this study are major proof points on why you should continue investing in VxRail to mitigate these challenges:
- Overburdened IT staff. The automated LCM and mechanisms in VxRail to maintain cluster integrity throughout the life of the cluster drives significant efficiencies in your IT infrastructure team.
- Unplanned outages that lead to significant disruption to businesses. The benefit of pretested and prevalidated sets of drivers, firmware, and software which we call VxRail Continuously Validated States is the significant reduction in risk as you update your HCI cluster from one version to the next.
- More time spent on deploying infrastructure and resulting slowdown of pace at which your business can innovate. The automation and integrated validation checks speeds up deployment times without compromising security.
The emphasis that we put on improving your LCM experience is extraordinary, and we encourage you to maximize your investment in VxRail. Updating to the latest VxRail software release gives you access to the many LCM enhancements that can drive greater efficiencies in your organization. And with VxRail Continuously Validated States, you can safely get to the next software release and the ones that follow.
For more information about the features in VxRail 7.0.400, check out this blog post:
For more information about the features in VxRail 7.0.450, see this post:
If you want to learn about the latest in the VxRail portfolio, you can check the VxRail page on the Dell Technologies website:
Author: Daniel Chiu, VxRail Technical Marketing
Learn About the Latest Major VxRail Software Release: VxRail 7.0.450
Thu, 11 May 2023 16:14:15 -0000|
Read Time: 0 minutes
To our many VxRail customers, you know that our innovation train is a constant machine that keeps on delivering more value while keeping you on a continuously validated track. The next stop on your VxRail journey brings you to VxRail 7.0.450 which offers significant benefits to life cycle management and dynamic node clusters.
This blog provides a deep dive into some of the life cycle management enhancements as well as PowerStore Life Cycle Management integration into VxRail Manager for VxRail dynamic node clusters. For a more comprehensive rundown of the features introduced in this release, see the release notes.
Life cycle management
The life cycle management features that I am covering can provide the most impact to our VxRail customers. The first set of features are designed to offer you actionable information at your fingertips. Imagine taking your first sip of coffee or tea as you log onto VxRail Manager at the start of your day, and you immediately have all the up-to-date information that you need to make decisions and plan out your work.
VxRail pre-update health check
The VxRail pre-update health check, or pre-check as the VxRail Manager UI refers to it, has been an important tool for you to determine the overall health of your clusters and assess the readiness for a cluster update. The output of this report brings helps you to be aware of troublesome areas and provides you with information, such as Knowledge Base articles, to resolve the issues. This tool relies on a script that can be automatically uploaded onto the VxRail Manager VM, if the cluster is securely connected to the Dell cloud, or manually uploaded as a bundle procured from the Dell Support website.
For the health check to stay reliable and improve over time, the development of the health check script needs to incorporate a continuous feedback loop so that the script can easily evolve. Feedback can come from our Dell Services and escalation engineering teams as they learn from support cases, and from the engineering team as new capabilities and additions are introduced to the VxRail offering.
To provide an even more accurate assessment of the cluster health and readiness for a cluster update, the VxRail team has increased the frequency of how often the health check script is updated. Starting with VxRail 7.0.450, clusters that are connected to the Dell cloud will automatically scan for new health check scripts multiple times per day. The health check will automatically run every 24 hours, with the latest script in hand, so that you will have an up-to-date report ready for your review whenever you log onto VxRail Manager. This enhancement has just made the pre-update health check even more reliable and convenient.
For clusters that are not connected to the Dell cloud, you can still benefit from the increased frequency of health script updates. However, you are responsible for checking for any updates on the Dell Support website, downloading them, and staging the script on VxRail Manager for the tool to utilize it.
VxRail cluster update planning
The next enhancement that I will delve into provides a simpler and more convenient cluster update planning experience. VxRail 7.0.450 introduces more automation into the cluster update planning operations, so that you have all the information that you need to plan for an update without manual intervention.
For a cluster connected to the Dell cloud, VxRail Manager will automatically scan for new update paths that are relevant to that particular cluster. This scan happens multiple times a day. If a new update path is found, VxRail Manager will download the lightweight manifest file from that target LCM composite bundle. This file provides the metadata of the LCM composite bundle, including the manifest of the target VxRail Continuously Validated State.
The following figure shows the information of two update paths provided by their manifest files to populate the Internet Updates tab. That information includes the target VxRail software version, estimated cluster update time, link to the release notes, and whether reboots are required for the nodes to complete an update to this target version. (You can disregard the actual software version numbers: these are engineering test builds used to demonstrate the new functionality.)
VxRail Manager, by default, will recommend the next software version on the same software train. For the recommended path, VxRail Manager automatically generates an update advisor report which is the new feature for cluster update planning. An update advisor report is a singular exportable report that consolidates the output from existing planning tools:
- Same metadata of the update path, as provided on the Internet Updates tab:
- The update advisory report that provides component-by-component change analysis, which helps users build IT infrastructure change reports:
- The health check report that was discussed earlier:
- The user-managed component report that reminds users whether they need to update non-VxRail managed components for a cluster update:
This report is automatically generated every 24 hours so that you can log onto VxRail Manager and have all the up-to-date information at your disposal to make informed decisions. This feature will make your life easier because you no longer have to manually run all these jobs and wait for them to complete!
For a non-recommended update, you can manually generate an update advisor report using the Actions button for the listed update path. For clusters not connected to Dell cloud, you can still benefit from the update advisor report. However, instead of downloading a lightweight manifest file, you would have to download the full LCM bundle from the Dell Support website to generate the report.
The last life cycle management feature that I want to focus on is about smart bundles. The term ‘smart bundle’ refers to a space-efficient LCM bundle that can be downloaded from the Dell cloud. For VxRail users who are using CloudIQ today to manage their VxRail clusters, this feature is familiar to you. A space-efficient bundle is created by first performing a change analysis of the VxRail Continuously Validated State currently running on a cluster versus the target VxRail Continuously Validated State that a user wants to download for their cluster. The change analysis determines the delta of install files in the full LCM bundle that is needed by the cluster to download and update to the target version.
In VxRail 7.0.450, you can now initiate smart bundle transfers from VxRail Manager. Smart bundles can greatly reduce the transfer size of an update bundle, which can be extremely beneficial for bandwidth-constrained environments. To use the smart bundle feature, the cluster has to be configured to connect to CloudIQ in the Dell cloud. If VxRail Manager is not properly configured to use the smart bundle feature or if the smart bundle operation fails, VxRail Manager defaults to using the traditional method of downloading the full LCM bundle from the Dell cloud.
VxRail dynamic nodes with PowerStore
VxRail 7.0.450 introduces the much-anticipated integration of PowerStore life cycle management into VxRail Manager for a configuration consisting of VxRail dynamic nodes using PowerStore as the primary storage (also referred to as Dynamic AppsON). This integration further centralizes PowerStore management onto the vCenter Server console for VMware environments. With the Virtual Storage Integrator (VSI) plugin to vCenter, you have been able to provision PowerStore storage and manage data services. Now, you can use the VxRail Manager plugin to manage a PowerStore update and view the array’s software version.
To enable this functionality, VxRail leverages the VSI’s new API server to communicate with the PowerStore Manager and initiate lifecycle management operations and retrieve status information. The API server was developed exclusively for VxRail Manager in a Dynamic AppsON configuration. You start the LCM workflow by first uploading the update bundle to PowerStore Manager, then running an update pre-check, and lastly running the update. The operations are initiated from VxRail Manager but the actual operations are executed on the PowerStore Manager.
The following video shows the PowerStore LCM workflow that can be run from the VxRail Manager. You can update a PowerStore that is using any storage type, except NFS, as the primary storage for a VxRail dynamic node cluster.
Although VxRail 7.0.450 is a jam packed release with many new features and enhancements, the features I’ve described are the headliners and deserve a deeper dive to unpack the capability set. Overall, the set of LCM enhancements in this release provides immense value for your future cluster management and update experience. For the full list of features introduced in this release, see the release notes. And for more information about VxRail in general, check out the Dell VxRail Hyperconverged Infrastructure page on www.dell.com.
Author: Daniel Chiu