Take Advantage of the Latest Enhancements to VxRail Life Cycle Management
Tue, 20 Jun 2023 16:52:40 -0000|
Read Time: 0 minutes
Providing the best life cycle management experience for HCI is not easy, nor is it a one-time job for which we can pat ourselves on the back and move on to the next endeavor. It’s a continuous cycle that incorporates feature enhancements and improvements based on your feedback. While we know that improving VxRail LCM is vitally important for us to continue to deliver differentiating value to you, it is just as important that your clusters continue to run the latest software to realize the benefits. In this post, I’ll provide a deep dive into the LCM enhancements introduced in the past few software releases so you can consider the added functionality that you can benefit from.
Focus areas for improved LCM
Going back into last year, we prioritized four focus areas to improve your LCM experience. While the value is incremental when you look at just a single software release, this post provides a holistic perspective of how VxRail has improved upon LCM over time to further increase the efficiencies that you enjoy today.
- Based on data that we have gathered on reported cluster update failures, we found that almost half of the update failures occurred because a node failed to enter maintenance mode. Effectively addressing this issue can potentially be the most impactful benefit for our customer base.
- As the VxRail footprint expands beyond the data center, resource constraints such as network bandwidth and Internet connectivity can become significant hurdles for effectively deploying infrastructure solutions at the edge. Recent enhancements in VxRail focused on creating space-efficient LCM bundle transfers.
- Doing more with less is a common thread across all organizations and industries. In the context of VxRail LCM, we’re looking to further simplify your cluster update planning experience by putting more actionable information at your fingertips.
- While no product, including VxRail, can avoid a failure from ever happening, VxRail looks to put you in a better position to protect your cluster and quickly recover from a failure.
Figure 1. 12+ month recap of LCM enhancements
Now that you know about the four focus areas, let’s get into the details about the actual improvements that have been introduced in the last 12+ months.
Mitigating maintenance mode failures
In our investigation, we were able to identify three major issues that caused a cluster update failure because a node did not enter maintenance mode accordingly:
- VMtools was still mounted on a VM.
- VMs were pinned to a host due to an existing policy.
- vSAN resynchronization was taking too long and exceeded the timeout value.
In VxRail 7.0.350, prechecks were added for the first two issues. When a pre-update health check is run, these new VxRail prechecks identify those issues if they exist and alert you in the report so that you can remedy the issue before initiating a cluster update. In the same release, the timeout value to wait for a node to enter maintenance mode was doubled to reduce the chance that vSAN resynchronization does not finish in time.
Next, the cluster update capability set was also enhanced to address a cluster update failure due to a node not entering maintenance mode as expected. With the combination of enhancements made to cluster update error handling and cluster update retry operations in VxRail 7.0.350 and VxRail 7.0.400 respectively, VxRail is now able to handle this scenario much more efficiently. If a node fails to enter maintenance mode, the cluster update operation now skips the node and continues on to the next node instead of failing out of the operation altogether. Upon running the cluster update retry operation, VxRail can automatically detect which node requires an update instead of updating the entire cluster.
Space-efficient LCM bundle transfers
The next area of improvement addressed reducing the package sizes of the LCM bundles. A smaller package size can be very beneficial for bandwidth-constrained environments such as edge locations.
VxRail 7.0.350 introduced the capability for you to designate a local Windows client at your data center to be the central repository and distributor of LCM bundles for remote VxRail clusters that are not connected to the Internet. Using a separate PowerShell commandlet installed on the client, you can initiate space-efficient bundle transfers from the client to your remote clusters in your internal network. The transfer operation automatically scans the manifest of the Continuously Validated State (VxRail software version) running on the VxRail cluster and determines the delta compared to the requested LCM bundle. Instead of transferring the full LCM bundle, which is greater than 10 GB in size, it only packages the necessary installation files. A much smaller LCM bundle can cut down on bandwidth usage and transfer times.
Figure 2. Central repository and distributor of LCM bundles to remote VxRail clusters
In VxRail 7.0.450, space-efficient LCM bundles can also be created when VxRail Manager downloads an LCM bundle from the Dell cloud. This feature requires that the VxRail Manager be connected to the Dell cloud.
Simplified cluster update planning experience
The next set of LCM enhancements is centered around providing you with critical insights to maximize the probability of a successful cluster update and for the information to be up-to-date and readily available whenever you need it.
Since VxRail 7.0.400, the pre-update health check includes a RecoverPoint for VMs compatibility precheck to detect whether its current version of software is compatible with the target VxRail software version.
VxRail 7.0.450 increased the frequency at which the VxRail prechecks file is updated. The increased frequency ensures that any additional prechecks added by engineering because of technology changes or new learnings from support cases are incorporated into the VxRail prechecks file that is run against your cluster. When your cluster is connected to the Dell cloud, VxRail Manager periodically scans for the latest VxRail prechecks file.
VxRail 7.0.450 also automated the health check to run every 24 hours. The combination of automated VxRail prechecks file scans and health check runs ensure that you have access to an up-to-date health check report once you log in to VxRail Manager.
VxRail 7.0.450 also further simplified your cluster update planning experience by consolidating into a single, exportable report all the necessary insights about your cluster to help you decide whether to move forward with a cluster update. This update advisor report has four sections:
- VxRail Update Advisor Report Summary includes the current VxRail version running on the cluster, the target (or selected) VxRail version, estimated duration to complete a cluster update, a link to the release notes, and information about your backup for your service VMs.
Figure 3. Update advisor report—summary report
- VxRail Components shows which components need to be updated to get to the target VxRail version. The table includes the current version and target version for each component.
Figure 4. Update advisor report—components report
- VxRail Precheck is the previously mentioned pre-update health check report, inclusive of all the enhancements discussed.
Figure 5. Update advisor report—LCM precheck report
- VxRail Custom Components is a report that highlights user-managed components installed on the cluster. You should consider these custom components when deciding whether to schedule a cluster update.
Figure 6. Update advisor report—custom components report
When VxRail Manager is connected to the Dell cloud, it automatically scans for new update paths. Once a new update path is detected, VxRail Manager downloads a lightweight manifest file that contains all the information needed to produce the update advisor report. The report is automatically generated every 24 hours. This feature is designed to streamline the availability of up-to-date critical insights to help you make an informed decision about a cluster update.
The last set of LCM enhancements that I will cover is around serviceability. While many of the features discussed earlier are meant to be proactive and to prevent failures, there are times when failures can still occur. Being able to efficiently troubleshoot the issues is critically important to getting your clusters back up and running quickly.
In VxRail 7.0.410, the logging capability was enhanced in a couple of areas so that the Dell Support team can pinpoint issues faster. When a pre-update health check identifies failures, the offending host is now recorded. If a node does fail to enter maintenance mode, the logs now capture the reason for the failure.
In VxRail 7.0.450, we automated the backup of the VxRail Manager VM and vCenter Server VM (if it’s VxRail managed). Now you can easily back up your service VMs before updating a cluster.
Figure 7. Automate VxRail backup of service VMs before a cluster update
This feature is also integrated into the update advisor report, where you can see the latest backup on the report summary and click a link to go to the backup page to create another backup.
Value of VxRail life cycle management
If life cycle management is one of the major reasons that you chose to invest in VxRail, our continuous improvements to life cycle management should be a compelling reason to keep your clusters running the latest software. VxRail life cycle management continues to provide significant value by addressing the challenges that your organization faces today.
Figure 8. VxRail benefits (data from "The Business Value of Dell VxRail HCI," April 2023, IDC)
In an IDC study sponsored by Dell Technologies, The Business Value of Dell VxRail HCI, the value that VxRail LCM provides to organizations is significant and compelling. The results of this study are major proof points on why you should continue investing in VxRail to mitigate these challenges:
- Overburdened IT staff. The automated LCM and mechanisms in VxRail to maintain cluster integrity throughout the life of the cluster drives significant efficiencies in your IT infrastructure team.
- Unplanned outages that lead to significant disruption to businesses. The benefit of pretested and prevalidated sets of drivers, firmware, and software which we call VxRail Continuously Validated States is the significant reduction in risk as you update your HCI cluster from one version to the next.
- More time spent on deploying infrastructure and resulting slowdown of pace at which your business can innovate. The automation and integrated validation checks speeds up deployment times without compromising security.
The emphasis that we put on improving your LCM experience is extraordinary, and we encourage you to maximize your investment in VxRail. Updating to the latest VxRail software release gives you access to the many LCM enhancements that can drive greater efficiencies in your organization. And with VxRail Continuously Validated States, you can safely get to the next software release and the ones that follow.
For more information about the features in VxRail 7.0.400, check out this blog post:
For more information about the features in VxRail 7.0.450, see this post:
If you want to learn about the latest in the VxRail portfolio, you can check the VxRail page on the Dell Technologies website:
Author: Daniel Chiu, VxRail Technical Marketing
Related Blog Posts
Learn About the Latest Major VxRail Software Release: VxRail 7.0.450
Thu, 11 May 2023 16:14:15 -0000|
Read Time: 0 minutes
To our many VxRail customers, you know that our innovation train is a constant machine that keeps on delivering more value while keeping you on a continuously validated track. The next stop on your VxRail journey brings you to VxRail 7.0.450 which offers significant benefits to life cycle management and dynamic node clusters.
This blog provides a deep dive into some of the life cycle management enhancements as well as PowerStore Life Cycle Management integration into VxRail Manager for VxRail dynamic node clusters. For a more comprehensive rundown of the features introduced in this release, see the release notes.
Life cycle management
The life cycle management features that I am covering can provide the most impact to our VxRail customers. The first set of features are designed to offer you actionable information at your fingertips. Imagine taking your first sip of coffee or tea as you log onto VxRail Manager at the start of your day, and you immediately have all the up-to-date information that you need to make decisions and plan out your work.
VxRail pre-update health check
The VxRail pre-update health check, or pre-check as the VxRail Manager UI refers to it, has been an important tool for you to determine the overall health of your clusters and assess the readiness for a cluster update. The output of this report brings helps you to be aware of troublesome areas and provides you with information, such as Knowledge Base articles, to resolve the issues. This tool relies on a script that can be automatically uploaded onto the VxRail Manager VM, if the cluster is securely connected to the Dell cloud, or manually uploaded as a bundle procured from the Dell Support website.
For the health check to stay reliable and improve over time, the development of the health check script needs to incorporate a continuous feedback loop so that the script can easily evolve. Feedback can come from our Dell Services and escalation engineering teams as they learn from support cases, and from the engineering team as new capabilities and additions are introduced to the VxRail offering.
To provide an even more accurate assessment of the cluster health and readiness for a cluster update, the VxRail team has increased the frequency of how often the health check script is updated. Starting with VxRail 7.0.450, clusters that are connected to the Dell cloud will automatically scan for new health check scripts multiple times per day. The health check will automatically run every 24 hours, with the latest script in hand, so that you will have an up-to-date report ready for your review whenever you log onto VxRail Manager. This enhancement has just made the pre-update health check even more reliable and convenient.
For clusters that are not connected to the Dell cloud, you can still benefit from the increased frequency of health script updates. However, you are responsible for checking for any updates on the Dell Support website, downloading them, and staging the script on VxRail Manager for the tool to utilize it.
VxRail cluster update planning
The next enhancement that I will delve into provides a simpler and more convenient cluster update planning experience. VxRail 7.0.450 introduces more automation into the cluster update planning operations, so that you have all the information that you need to plan for an update without manual intervention.
For a cluster connected to the Dell cloud, VxRail Manager will automatically scan for new update paths that are relevant to that particular cluster. This scan happens multiple times a day. If a new update path is found, VxRail Manager will download the lightweight manifest file from that target LCM composite bundle. This file provides the metadata of the LCM composite bundle, including the manifest of the target VxRail Continuously Validated State.
The following figure shows the information of two update paths provided by their manifest files to populate the Internet Updates tab. That information includes the target VxRail software version, estimated cluster update time, link to the release notes, and whether reboots are required for the nodes to complete an update to this target version. (You can disregard the actual software version numbers: these are engineering test builds used to demonstrate the new functionality.)
VxRail Manager, by default, will recommend the next software version on the same software train. For the recommended path, VxRail Manager automatically generates an update advisor report which is the new feature for cluster update planning. An update advisor report is a singular exportable report that consolidates the output from existing planning tools:
- Same metadata of the update path, as provided on the Internet Updates tab:
- The update advisory report that provides component-by-component change analysis, which helps users build IT infrastructure change reports:
- The health check report that was discussed earlier:
- The user-managed component report that reminds users whether they need to update non-VxRail managed components for a cluster update:
This report is automatically generated every 24 hours so that you can log onto VxRail Manager and have all the up-to-date information at your disposal to make informed decisions. This feature will make your life easier because you no longer have to manually run all these jobs and wait for them to complete!
For a non-recommended update, you can manually generate an update advisor report using the Actions button for the listed update path. For clusters not connected to Dell cloud, you can still benefit from the update advisor report. However, instead of downloading a lightweight manifest file, you would have to download the full LCM bundle from the Dell Support website to generate the report.
The last life cycle management feature that I want to focus on is about smart bundles. The term ‘smart bundle’ refers to a space-efficient LCM bundle that can be downloaded from the Dell cloud. For VxRail users who are using CloudIQ today to manage their VxRail clusters, this feature is familiar to you. A space-efficient bundle is created by first performing a change analysis of the VxRail Continuously Validated State currently running on a cluster versus the target VxRail Continuously Validated State that a user wants to download for their cluster. The change analysis determines the delta of install files in the full LCM bundle that is needed by the cluster to download and update to the target version.
In VxRail 7.0.450, you can now initiate smart bundle transfers from VxRail Manager. Smart bundles can greatly reduce the transfer size of an update bundle, which can be extremely beneficial for bandwidth-constrained environments. To use the smart bundle feature, the cluster has to be configured to connect to CloudIQ in the Dell cloud. If VxRail Manager is not properly configured to use the smart bundle feature or if the smart bundle operation fails, VxRail Manager defaults to using the traditional method of downloading the full LCM bundle from the Dell cloud.
VxRail dynamic nodes with PowerStore
VxRail 7.0.450 introduces the much-anticipated integration of PowerStore life cycle management into VxRail Manager for a configuration consisting of VxRail dynamic nodes using PowerStore as the primary storage (also referred to as Dynamic AppsON). This integration further centralizes PowerStore management onto the vCenter Server console for VMware environments. With the Virtual Storage Integrator (VSI) plugin to vCenter, you have been able to provision PowerStore storage and manage data services. Now, you can use the VxRail Manager plugin to manage a PowerStore update and view the array’s software version.
To enable this functionality, VxRail leverages the VSI’s new API server to communicate with the PowerStore Manager and initiate lifecycle management operations and retrieve status information. The API server was developed exclusively for VxRail Manager in a Dynamic AppsON configuration. You start the LCM workflow by first uploading the update bundle to PowerStore Manager, then running an update pre-check, and lastly running the update. The operations are initiated from VxRail Manager but the actual operations are executed on the PowerStore Manager.
The following video shows the PowerStore LCM workflow that can be run from the VxRail Manager. You can update a PowerStore that is using any storage type, except NFS, as the primary storage for a VxRail dynamic node cluster.
Although VxRail 7.0.450 is a jam packed release with many new features and enhancements, the features I’ve described are the headliners and deserve a deeper dive to unpack the capability set. Overall, the set of LCM enhancements in this release provides immense value for your future cluster management and update experience. For the full list of features introduced in this release, see the release notes. And for more information about VxRail in general, check out the Dell VxRail Hyperconverged Infrastructure page on www.dell.com.
Author: Daniel Chiu
Learn About the Latest Major VxRail Software Release: VxRail 7.0.400
Thu, 22 Sep 2022 13:11:44 -0000|
Read Time: 0 minutes
As many parts of the world welcome the fall season and the cooler temperatures that it brings, one area that has not cooled down is VxRail. The latest VxRail software release, 7.0.400, introduces a slew of new features that will surely fire up our VxRail customers and spur them to schedule their next cluster update.
VxRail 7.0.400 provides support for VMware ESXi 7.0 Update 3g and VMware vCenter Server 7.0 Update 3g. All existing platforms that support VxRail 7.0 can upgrade to VxRail 7.0.400. Upgrades from VxRail 4.5 and 4.7 are supported, which is an important consideration because standard support from Dell for those versions ends on September 30.
VxRail 7.0.400 software introduces features in the following areas:
- Life cycle management
- Dynamic nodes
- Configuration flexibility
This blog delves into major enhancements in those areas. For a more comprehensive rundown of the features added to this release, see the release notes.
Life cycle management
Because life cycle management is a key area of value differentiation for our VxRail customers, the VxRail team is continuously looking for ways to further enhance the life cycle management experience. One aspect that has come into recent focus is handling cluster update failures caused by VxRail nodes failing to enter maintenance mode.
During a cluster update, nodes are put into maintenance mode one at time. Their workloads are moved onto the remaining nodes in the cluster to maintain availability while the nodes go through software, firmware, and driver updates. VxRail 7.0.350 introduced capabilities to notify users of situations such as host pinning and mounted VM tools on the host that can cause nodes to fail to enter maintenance mode, so users can address those situations before initiating a cluster update.
VxRail 7.0.400 addresses this cluster update failure scenario even further by being smarter with how it handles this issue once the cluster update is in operation. If a node fails to enter maintenance mode, VxRail automatically skips that node and moves onto the next node. Previously, this scenario would cause the cluster update operation to fail. Now, users can run that cluster update and process as many nodes as possible. Users can then run a cluster update retry, which targets only the nodes that were skipped. The combination of skipping nodes and targeted retry of those skipped nodes significantly improves the cluster update experience.
Figure 1: Addressing nodes failing to enter maintenance mode
In VxRail 7.0.400, a Dell RecoverPoint for VMs compatibility check has been added to the update advisory report, cluster update pre-check, and cluster update operation to inform users of a potential incompatibility scenario. Having data protection in an unsupported state puts an environment at risk. The addition of the compatibility check is a great news for RecoverPoint for VMs users because this previously manual task is now automated, helping to reduce risk and streamline operations.
VxRail dynamic nodes
Since the introduction of VxRail dynamic nodes last year, we’ve incrementally added more storage protocol support for increased flexibility. NFS, CIFS, and iSCSI support were added earlier this year. In VxRail 7.0.400, users can configure their VxRail dynamic nodes with storage from Dell PowerStore using NVMe on Fabric over TCP (NVMe-oF/TCP). NVMe provides much faster data access compared to SATA and SAS. The support requires Dell PowerStoreOS 2.1 or later and Dell PowerSwitch with the virtual Dell SmartFabric Storage Service appliance.
VxRail cluster deployment using NVMe-oF/TCP is not much different from setting up iSCSI storage as the primary datastore for VxRail dynamic node clusters. The cluster must go through the Day 1 bring-up activities to establish IP connectivity. From there, the user can then set up the port group, VM kernels, and NVMe-oF/TCP adapter to access the storage shared from the PowerStore.
Setting up NVMe-oF/TCP between the VxRail dynamic node cluster and PowerStore is separate from the cluster deployment activities. You can find more information about deploying NVMe-oF/TCP here: https://infohub.delltechnologies.com/t/smartfabric-storage-software-deployment-guide/.
VxRail 7.0.400 also adds VMware Virtual Volumes (vVols) support for VxRail dynamic nodes. Cluster deployment with vVols over Fibre Channel follows a workflow similar to cluster deployment with a VMFS datastore. Provisioning and zoning of the Virtual Volume needs to be done before the Day 1 bring-up. The VxRail Manager VM is installed onto the datastore as part of the Day 1 bring-up.
For vVols over IP, the Day 1 bring-up needs to be completed first to establish IP connectivity. Then the Virtual Volume can be mounted and a datastore can be created from it for the VxRail Manager VM.
Figure 2: Workflow to set up VxRail dynamic node clusters with VMware Virtual Volumes
VxRail 7.0.400 introduces the option for customers to deploy a local VxRail managed vCenter Server with their VxRail dynamic node cluster. The Day 1 bring-up installs a vCenter Server onto the cluster with a 60-day evaluation license, but the customer is required to purchase their own vCenter Server license. VxRail customers are accustomed to having a Standard edition vCenter Server license packaged with their VxRail purchase. However, that vCenter Server license is bundled with the VMware vSAN license, not the VMware vSphere license.
VxRail 7.0.400 supports the use of Dell PowerPath/VE with VxRail dynamic nodes, which is important to many storage customers who have been relying on PowerPath software for multipathing capabilities. With VxRail 7.0.400, VxRail dynamic nodes can use PowerPath with PowerStore, PowerMax, or Unity XT storage array via NFS, iSCSI, or NVMe over Fibre Channel storage protocol.
Another topic that continues to burn bright, no matter the season, is security. As threats continue to evolve, it’s important to continue to advance security measures for the infrastructure. VxRail 7.0.400 introduces capabilities that make it even easier for customers to further protect their clusters.
While the security configuration rules set forth by the Security Technical Implementation Guide (STIG) are required for customers working in or with the U.S. federal government and Department of Defense, other customers can benefit from hardening their own clusters. VxRail 7.0.400 automatically applies a subset of the STIG rules on all VxRail clusters. These rules protect VM controls and the underlying SUSE Linux operating system controls. Application of the rules occurs without any user intervention upon an upgrade to VxRail 7.0.400 and at the cluster deployment with this software version, providing a seamless experience. This feature increases the security baseline for all VxRail clusters starting with VxRail 7.0.400.
Digital certificates are used to verify the external communication between trusted entities. VxRail customers have two options for digital certificates. Self-signed certificates use the VxRail as the certificate authority to sign the certificate. Customers use this option if they don’t need a Certificate Authority or choose not to pay for the service. Otherwise, customers can import a certificate signed by a Certificate Authority to the VxRail Manager. Both options require certificates to be shared between the VxRail Manager and vCenter Server for secure communication to manage the cluster.
Previously, both options required manual intervention, at varying levels, to manage certificate renewals and ensure uninterrupted communication between the VxRail Manager and the vCenter Server. Loss of communication can affect cluster management operations, though not the application workloads.
Figure 3: Workflow for managing certificates
With VxRail 7.0.400, all areas of managing certificates have been simplified to make it easier and safer to import and manage certificates over time. Now, VxRail certificates can be imported via the VxRail Manager and API. There’s an API to import the vCenter certificate into the VxRail trust store. Renewals can be managed automatically via the VxRail Manager so that customers do not need to constantly check expiring certificates and replace certificates. Alternatively, new API calls have been created to perform these activities. While these features simplify the experience for customers already using certificates, hopefully the simplified certificate management will encourage more customers to use it to further secure their environment.
VxRail 7.0.400 also introduces end-to-end upgrade bundle integrity check. This feature has been added to the pre-update health check and the cluster update operation. The signing certificate is verified to ensure the validity of the root certificate authority. The digital certificate is verified. The bundle manifest is also checked to ensure that the contents in the bundle have not been altered.
With any major VxRail software release comes enhancements in configuration flexibility. VxRail 7.0.400 provides more flexibility for base networking and more flexibility in using and managing satellite nodes.
Previous VxRail software releases introduced long-awaited support for dynamic link aggregation for vSAN and vSphere vMotion traffic and support for two vSphere Distributed Switches (VDS) to separate traffic management traffic from vSAN and vMotion traffic. VxRail 7.0.400 removes the previous port count restriction of four ports for base networking. Customers can now also deploy clusters with six or eight ports for base networking while employing link aggregation or multiple VDS, or both.
Figure 4: Two VDS with six NIC ports
Figure 5: Two VDS with eight NIC ports with link redundancy for vMotion traffic and link aggregation for vSAN traffic
With VxRail 7.0.400, customers can convert their vSphere Standard Switch on their satellite nodes to a customer-managed VDS after deployment. This support allows customers to more easily manage their VDS and satellite nodes at scale.
The most noteworthy serviceability enhancement I want to mention is the ability to create service tickets from the VxRail Manager UI. This functionality makes it easier for customers to submit service tickets, which can speed resolution time and improve the feedback loop for providing product improvement suggestions. This feature requires an active connection with the Embedded Service Enabler to Dell Support Services. Customers can submit up to five attachments to support a service ticket.
Figure 6: Input form to create a service request
VxRail 7.0.400 is no doubt one of the more feature-heavy VxRail software releases in some time. Customers big and small will find value in the capability set. This software release enhances existing features while also introducing new tools that further focus on VxRail operational simplicity. While this blog covers the highlights of this release, I recommend that you review the release notes to further understand all the capabilities in VxRail 7.0.400.