Take Advantage of the Latest Enhancements to VxRail Life Cycle Management
Wed, 24 Apr 2024 12:07:35 -0000
|Read Time: 0 minutes
Providing the best life cycle management experience for HCI is not easy, nor is it a one-time job for which we can pat ourselves on the back and move on to the next endeavor. It’s a continuous cycle that incorporates feature enhancements and improvements based on your feedback. While we know that improving VxRail LCM is vitally important for us to continue to deliver differentiating value to you, it is just as important that your clusters continue to run the latest software to realize the benefits. In this post, I’ll provide a deep dive into the LCM enhancements introduced in the past few software releases so you can consider the added functionality that you can benefit from.
Focus areas for improved LCM
Going back into last year, we prioritized four focus areas to improve your LCM experience. While the value is incremental when you look at just a single software release, this post provides a holistic perspective of how VxRail has improved upon LCM over time to further increase the efficiencies that you enjoy today.
- Based on data that we have gathered on reported cluster update failures, we found that almost half of the update failures occurred because a node failed to enter maintenance mode. Effectively addressing this issue can potentially be the most impactful benefit for our customer base.
- As the VxRail footprint expands beyond the data center, resource constraints such as network bandwidth and Internet connectivity can become significant hurdles for effectively deploying infrastructure solutions at the edge. Recent enhancements in VxRail focused on creating space-efficient LCM bundle transfers.
- Doing more with less is a common thread across all organizations and industries. In the context of VxRail LCM, we’re looking to further simplify your cluster update planning experience by putting more actionable information at your fingertips.
- While no product, including VxRail, can avoid a failure from ever happening, VxRail looks to put you in a better position to protect your cluster and quickly recover from a failure.
Figure 1. 12+ month recap of LCM enhancements
Now that you know about the four focus areas, let’s get into the details about the actual improvements that have been introduced in the last 12+ months.
Mitigating maintenance mode failures
In our investigation, we were able to identify three major issues that caused a cluster update failure because a node did not enter maintenance mode accordingly:
- VMtools was still mounted on a VM.
- VMs were pinned to a host due to an existing policy.
- vSAN resynchronization was taking too long and exceeded the timeout value.
In VxRail 7.0.350, prechecks were added for the first two issues. When a pre-update health check is run, these new VxRail prechecks identify those issues if they exist and alert you in the report so that you can remedy the issue before initiating a cluster update. In the same release, the timeout value to wait for a node to enter maintenance mode was doubled to reduce the chance that vSAN resynchronization does not finish in time.
Next, the cluster update capability set was also enhanced to address a cluster update failure due to a node not entering maintenance mode as expected. With the combination of enhancements made to cluster update error handling and cluster update retry operations in VxRail 7.0.350 and VxRail 7.0.400 respectively, VxRail is now able to handle this scenario much more efficiently. If a node fails to enter maintenance mode, the cluster update operation now skips the node and continues on to the next node instead of failing out of the operation altogether. Upon running the cluster update retry operation, VxRail can automatically detect which node requires an update instead of updating the entire cluster.
Space-efficient LCM bundle transfers
The next area of improvement addressed reducing the package sizes of the LCM bundles. A smaller package size can be very beneficial for bandwidth-constrained environments such as edge locations.
VxRail 7.0.350 introduced the capability for you to designate a local Windows client at your data center to be the central repository and distributor of LCM bundles for remote VxRail clusters that are not connected to the Internet. Using a separate PowerShell commandlet installed on the client, you can initiate space-efficient bundle transfers from the client to your remote clusters in your internal network. The transfer operation automatically scans the manifest of the Continuously Validated State (VxRail software version) running on the VxRail cluster and determines the delta compared to the requested LCM bundle. Instead of transferring the full LCM bundle, which is greater than 10 GB in size, it only packages the necessary installation files. A much smaller LCM bundle can cut down on bandwidth usage and transfer times.
Figure 2. Central repository and distributor of LCM bundles to remote VxRail clusters
In VxRail 7.0.450, space-efficient LCM bundles can also be created when VxRail Manager downloads an LCM bundle from the Dell cloud. This feature requires that the VxRail Manager be connected to the Dell cloud.
Simplified cluster update planning experience
The next set of LCM enhancements is centered around providing you with critical insights to maximize the probability of a successful cluster update and for the information to be up-to-date and readily available whenever you need it.
Since VxRail 7.0.400, the pre-update health check includes a RecoverPoint for VMs compatibility precheck to detect whether its current version of software is compatible with the target VxRail software version.
VxRail 7.0.450 increased the frequency at which the VxRail prechecks file is updated. The increased frequency ensures that any additional prechecks added by engineering because of technology changes or new learnings from support cases are incorporated into the VxRail prechecks file that is run against your cluster. When your cluster is connected to the Dell cloud, VxRail Manager periodically scans for the latest VxRail prechecks file.
VxRail 7.0.450 also automated the health check to run every 24 hours. The combination of automated VxRail prechecks file scans and health check runs ensure that you have access to an up-to-date health check report once you log in to VxRail Manager.
VxRail 7.0.450 also further simplified your cluster update planning experience by consolidating into a single, exportable report all the necessary insights about your cluster to help you decide whether to move forward with a cluster update. This update advisor report has four sections:
- VxRail Update Advisor Report Summary includes the current VxRail version running on the cluster, the target (or selected) VxRail version, estimated duration to complete a cluster update, a link to the release notes, and information about your backup for your service VMs.
Figure 3. Update advisor report—summary report
- VxRail Components shows which components need to be updated to get to the target VxRail version. The table includes the current version and target version for each component.
Figure 4. Update advisor report—components report
- VxRail Precheck is the previously mentioned pre-update health check report, inclusive of all the enhancements discussed.
Figure 5. Update advisor report—LCM precheck report
- VxRail Custom Components is a report that highlights user-managed components installed on the cluster. You should consider these custom components when deciding whether to schedule a cluster update.
Figure 6. Update advisor report—custom components report
When VxRail Manager is connected to the Dell cloud, it automatically scans for new update paths. Once a new update path is detected, VxRail Manager downloads a lightweight manifest file that contains all the information needed to produce the update advisor report. The report is automatically generated every 24 hours. This feature is designed to streamline the availability of up-to-date critical insights to help you make an informed decision about a cluster update.
Serviceability
The last set of LCM enhancements that I will cover is around serviceability. While many of the features discussed earlier are meant to be proactive and to prevent failures, there are times when failures can still occur. Being able to efficiently troubleshoot the issues is critically important to getting your clusters back up and running quickly.
In VxRail 7.0.410, the logging capability was enhanced in a couple of areas so that the Dell Support team can pinpoint issues faster. When a pre-update health check identifies failures, the offending host is now recorded. If a node does fail to enter maintenance mode, the logs now capture the reason for the failure.
In VxRail 7.0.450, we automated the backup of the VxRail Manager VM and vCenter Server VM (if it’s VxRail managed). Now you can easily back up your service VMs before updating a cluster.
Figure 7. Automate VxRail backup of service VMs before a cluster update
This feature is also integrated into the update advisor report, where you can see the latest backup on the report summary and click a link to go to the backup page to create another backup.
Value of VxRail life cycle management
If life cycle management is one of the major reasons that you chose to invest in VxRail, our continuous improvements to life cycle management should be a compelling reason to keep your clusters running the latest software. VxRail life cycle management continues to provide significant value by addressing the challenges that your organization faces today.
Figure 8. VxRail benefits (data from "The Business Value of Dell VxRail HCI," April 2023, IDC)
In an IDC study sponsored by Dell Technologies, The Business Value of Dell VxRail HCI, the value that VxRail LCM provides to organizations is significant and compelling. The results of this study are major proof points on why you should continue investing in VxRail to mitigate these challenges:
- Overburdened IT staff. The automated LCM and mechanisms in VxRail to maintain cluster integrity throughout the life of the cluster drives significant efficiencies in your IT infrastructure team.
- Unplanned outages that lead to significant disruption to businesses. The benefit of pretested and prevalidated sets of drivers, firmware, and software which we call VxRail Continuously Validated States is the significant reduction in risk as you update your HCI cluster from one version to the next.
- More time spent on deploying infrastructure and resulting slowdown of pace at which your business can innovate. The automation and integrated validation checks speeds up deployment times without compromising security.
Conclusion
The emphasis that we put on improving your LCM experience is extraordinary, and we encourage you to maximize your investment in VxRail. Updating to the latest VxRail software release gives you access to the many LCM enhancements that can drive greater efficiencies in your organization. And with VxRail Continuously Validated States, you can safely get to the next software release and the ones that follow.
Resources
For more information about the features in VxRail 7.0.400, check out this blog post:
https://infohub.delltechnologies.com/p/learn-about-the-latest-major-vxrail-software-release-vxrail-7-0-400/
For more information about the features in VxRail 7.0.450, see this post:
https://infohub.delltechnologies.com/p/learn-about-the-latest-major-vxrail-software-release-vxrail-7-0-450/
If you want to learn about the latest in the VxRail portfolio, you can check the VxRail page on the Dell Technologies website:
https://www.dell.com/en-us/dt/converged-infrastructure/vxrail/index.htm
Author: Daniel Chiu, VxRail Technical Marketing
https://www.linkedin.com/in/daniel-chiu-8422287/