Easing Life Cycle Management with VxRail
Thu, 13 Oct 2022 22:47:20 -0000
|Read Time: 0 minutes
This is the second article in a series introducing VxRail concepts.
I mentioned in the introduction blog that I previously worked in Technical Support for Dell. That experience really set the stage for me to embrace VxRail because the VxRail approach to life cycle management eases a lot of the pain points I saw in support engagements. Many of the issues I saw were resolved with system updates, and VxRail makes moving through the life cycle significantly easier than with traditional hardware or an internally built solution. We do this with our state management model, known as Continuously Validated States. Let’s take some time to understand what these are, because they help enable VxRail customers to do more with their infrastructure more easily than before.
Defining a state
I’m someone who likes to be thorough, so if you already understand what a system state is, then you can skip this section. But for readers newer to infrastructure, this might be a different way to think about things. A system state, be it good or bad, refers to the hardware, firmware, drivers, and system software that power the infrastructure. When your servers or clusters are in a “good” or “happy” state, then everything is working optimally. A “bad” or “faulty” state might have a compatibility issue creating crashes, or it might contain failed hardware. Replacing failed hardware is an example of modifying the hardware state. Modifying the software state might look like an update to VMware software. All these changes then represent new individual states.
VxRail takes the chaos out of traditional state management for customers and replaces it with confidence. VxRail Continuously Validated States make the exchange from chaos to confidence possible. Updating a cluster, such as to a new vCenter version, means changing a cluster, and that change introduces uncertainty. That uncertainty is natural because customers are moving their infrastructure into new unknown configurations.
Let’s discuss the “Validated” portion of Continuously Validated States. VxRail engineering validates the current state, the state you intend to go to, and the continuity through the update cycle. Customers can gain tremendous value by relying on VxRail Engineering to validate all three aspects of an upgrade. This is the “Validated” part of Continuously Validated States that completely inverts the experience I got used to while working in Technical Support.
Moving to a new state
When you make a change, such as adding a driver or updating system software, you are modifying the system state. Making changes to system states has always been a problem with different remediation strategies that have revealed new IT challenges. I believe the challenge that Continuously Validated States best addresses can be described as, “I need my infrastructure to help me respond to new business needs and make moving through the life cycle as easy as possible.” Modifying an HCI cluster designed internally would present additional difficulties because you don’t know what kind of behavior to expect without testing.
This kind of change anxiety is what the validation process in our state-creation process aims to correct. Before the VxRail Engineering team releases a new VxRail update package—a package that would change your cluster’s system state, the package is tested in the team’s dedicated testing facility for nearly 800,000 cumulative hours. The facility has comprehensive access to the hardware that VxRail supports, allowing thorough testing. The purpose of this testing is to first ensure that all the new supported configurations are stable and then ensure that the move from old cluster states to the new states is a reliable process.
Lifecycle continuity
The creation of a series of known-good configurations isn’t the only benefit VxRail can provide with this different approach to state management. Let’s talk about the continuity that Continuously Validated States provide. VxRail clusters spend their entire lives conforming with and moving between different configurations supported and defined by the Continuously Validated State. This creates a continuity that begins from the time a cluster is first unloaded from the truck, persists through the changes of both the update cycle and hardware modification, and continues on to the final point of cluster retirement.
Let’s tie these ideas together. I like to think of Continuously Validated States as being like a GPS that helps avoid road construction during a cluster’s life. VxRail can do this because our engineering teams are building the roads and identifying the best routes. Go ahead and imagine a map for me. I like to imagine a map of my home state. No matter what kind of map, it’s going to have a bunch of points and show you how to move from one point to another. Continuously Validated States serve a similar role for your clusters. Much like the points on your map, each of these states verifies new hardware and software versions for customers to move their clusters to. These states serve another role like that of a GPS—they help identify the ideal paths between states and help clusters efficiently move between them. As you might have guessed, the Continuously Validated States model isn’t simple cartography. This ideal path is identified through hundreds of thousands of testing hours performed by VxRail Engineering team members in a massive million-dollar lab environment. Those movement paths, in combination with software tooling in the update process, create continuity for clusters as they move between states and proceed through their life cycles.
Conclusion
Hopefully, this blog has helped distinguish how Continuously Validated States change configuration management for the better. Changing the configuration state of production clusters is an anxiety-generating action that VxRail eases by creating, testing, and validating known-good configuration states for customers. The result is that customers can update their equipment with more confidence than ever and spend more IT resources focused on enabling business projects than on performing maintenance tasks. Mike Athanasiou, a colleague of mine, did a fantastic job with our Interactive Journey video series. In the videos, Mike shows how the use of Continuously Validated States enhances different areas of cluster management. I found the videos helpful in better understanding VxRail.
The next entry in this blog series will address the advantage that VxRail offers in the update process.
Related Blog Posts
Impacting the World, One Happy Customer at a Time
Fri, 25 Aug 2023 21:45:55 -0000
|Read Time: 0 minutes
As I get back from a lovely week to relax and reset by the beach in Mauritius, I have had time to realize that sometimes the important thing to do is to find time to relax and rejuvenate. I have come back with a burst of energy ready to get back to doing what I love most – spending time helping customers build simple Infrastructure solutions.
As a core member of the Dell Technologies infrastructure solutions sales team, I have come to realize that our core job is to solve problems. All businesses today are out there solving customer problems and challenges, either by producing goods or delivering services. Most businesses today have a lot of behind-the-scenes challenges to overcome to be able to help their customers.
Technology plays a big part in everything we do today, and IT teams must be on top of their game all the time to ensure businesses can continue to focus on what’s important – Customers!
I have had the opportunity to work with a non-profit organization that is literally making the world a better place for everyone globally. The work they do is non-stop and it is not easy. Their work requires an immaculate IT setup that needs to be always online, secure, and able to scale for their bespoke applications. Their current setup has gone through some major changes in terms of their applications and tracking methodologies. They had been experiencing multiple information and data silos, complexity in infrastructure management, and data security issues. In helping them find a way to simplify their IT, we too played a part in making the world a better place.
We had a few conversations and agreed that we needed to build the entire infrastructure on one platform. In this case, VMware was the unanimous choice. The two biggest challenges were to eliminate silos and to simplify management. HCI was the best way to achieve both, and we chose VxRail HCI systems. This solution went on to deliver a consistent platform across the edge, core, and cloud. It has proven to be a solution that can that manage all of the compute, storage, and networking resources through a single pane of glass with vCenter -- all under a single support umbrella for all of the hardware and software deployed.
Lifecycle management with BIOS, firmware, and software updates and upgrades can be a painful and time-consuming process. But what if I told you we can automate these tasks with one-click upgrades, one node at a time without any downtime – how does that sound? When I asked, the CTO was happy, and the IT manager was happier. All those investments in our R&D labs with over 100 people working on resolving some of the most common challenges -- like upgrades for IT teams around the world -- now made sense.
What made the solution choice easier was the ability to remotely monitor it from anywhere in the world with Cloud IQ, and its ability to scale and grow, not just on premises but in the cloud, any cloud at any time.
Did we manage to resolve their IT challenges - Yes, with a simplified solution like VxRail that provides performance, management simplicity, automation of tasks, and the flexibility to grow and scale. The customer was delighted - knowing full well that they now have an infrastructure setup that helps them do all the work they do consistently, and to be able to expand their work to different Geo Regions as well.
At the end of it all did I enjoy my time off after helping build an infrastructure solution for an organization doing something so meaningful. While I was away, I did get a postcard from the IT manager who was able take his wonderful family out for a nice little vacation, knowing that he could easily manage anything he needed to from anywhere in the world.
On to helping our next customer get the same peace of mind so they can leave their mark on the world too.
Author: Manish Bajaj
Take Advantage of the Latest Enhancements to VxRail Life Cycle Management
Wed, 24 Apr 2024 12:07:35 -0000
|Read Time: 0 minutes
Providing the best life cycle management experience for HCI is not easy, nor is it a one-time job for which we can pat ourselves on the back and move on to the next endeavor. It’s a continuous cycle that incorporates feature enhancements and improvements based on your feedback. While we know that improving VxRail LCM is vitally important for us to continue to deliver differentiating value to you, it is just as important that your clusters continue to run the latest software to realize the benefits. In this post, I’ll provide a deep dive into the LCM enhancements introduced in the past few software releases so you can consider the added functionality that you can benefit from.
Focus areas for improved LCM
Going back into last year, we prioritized four focus areas to improve your LCM experience. While the value is incremental when you look at just a single software release, this post provides a holistic perspective of how VxRail has improved upon LCM over time to further increase the efficiencies that you enjoy today.
- Based on data that we have gathered on reported cluster update failures, we found that almost half of the update failures occurred because a node failed to enter maintenance mode. Effectively addressing this issue can potentially be the most impactful benefit for our customer base.
- As the VxRail footprint expands beyond the data center, resource constraints such as network bandwidth and Internet connectivity can become significant hurdles for effectively deploying infrastructure solutions at the edge. Recent enhancements in VxRail focused on creating space-efficient LCM bundle transfers.
- Doing more with less is a common thread across all organizations and industries. In the context of VxRail LCM, we’re looking to further simplify your cluster update planning experience by putting more actionable information at your fingertips.
- While no product, including VxRail, can avoid a failure from ever happening, VxRail looks to put you in a better position to protect your cluster and quickly recover from a failure.
Figure 1. 12+ month recap of LCM enhancements
Now that you know about the four focus areas, let’s get into the details about the actual improvements that have been introduced in the last 12+ months.
Mitigating maintenance mode failures
In our investigation, we were able to identify three major issues that caused a cluster update failure because a node did not enter maintenance mode accordingly:
- VMtools was still mounted on a VM.
- VMs were pinned to a host due to an existing policy.
- vSAN resynchronization was taking too long and exceeded the timeout value.
In VxRail 7.0.350, prechecks were added for the first two issues. When a pre-update health check is run, these new VxRail prechecks identify those issues if they exist and alert you in the report so that you can remedy the issue before initiating a cluster update. In the same release, the timeout value to wait for a node to enter maintenance mode was doubled to reduce the chance that vSAN resynchronization does not finish in time.
Next, the cluster update capability set was also enhanced to address a cluster update failure due to a node not entering maintenance mode as expected. With the combination of enhancements made to cluster update error handling and cluster update retry operations in VxRail 7.0.350 and VxRail 7.0.400 respectively, VxRail is now able to handle this scenario much more efficiently. If a node fails to enter maintenance mode, the cluster update operation now skips the node and continues on to the next node instead of failing out of the operation altogether. Upon running the cluster update retry operation, VxRail can automatically detect which node requires an update instead of updating the entire cluster.
Space-efficient LCM bundle transfers
The next area of improvement addressed reducing the package sizes of the LCM bundles. A smaller package size can be very beneficial for bandwidth-constrained environments such as edge locations.
VxRail 7.0.350 introduced the capability for you to designate a local Windows client at your data center to be the central repository and distributor of LCM bundles for remote VxRail clusters that are not connected to the Internet. Using a separate PowerShell commandlet installed on the client, you can initiate space-efficient bundle transfers from the client to your remote clusters in your internal network. The transfer operation automatically scans the manifest of the Continuously Validated State (VxRail software version) running on the VxRail cluster and determines the delta compared to the requested LCM bundle. Instead of transferring the full LCM bundle, which is greater than 10 GB in size, it only packages the necessary installation files. A much smaller LCM bundle can cut down on bandwidth usage and transfer times.
Figure 2. Central repository and distributor of LCM bundles to remote VxRail clusters
In VxRail 7.0.450, space-efficient LCM bundles can also be created when VxRail Manager downloads an LCM bundle from the Dell cloud. This feature requires that the VxRail Manager be connected to the Dell cloud.
Simplified cluster update planning experience
The next set of LCM enhancements is centered around providing you with critical insights to maximize the probability of a successful cluster update and for the information to be up-to-date and readily available whenever you need it.
Since VxRail 7.0.400, the pre-update health check includes a RecoverPoint for VMs compatibility precheck to detect whether its current version of software is compatible with the target VxRail software version.
VxRail 7.0.450 increased the frequency at which the VxRail prechecks file is updated. The increased frequency ensures that any additional prechecks added by engineering because of technology changes or new learnings from support cases are incorporated into the VxRail prechecks file that is run against your cluster. When your cluster is connected to the Dell cloud, VxRail Manager periodically scans for the latest VxRail prechecks file.
VxRail 7.0.450 also automated the health check to run every 24 hours. The combination of automated VxRail prechecks file scans and health check runs ensure that you have access to an up-to-date health check report once you log in to VxRail Manager.
VxRail 7.0.450 also further simplified your cluster update planning experience by consolidating into a single, exportable report all the necessary insights about your cluster to help you decide whether to move forward with a cluster update. This update advisor report has four sections:
- VxRail Update Advisor Report Summary includes the current VxRail version running on the cluster, the target (or selected) VxRail version, estimated duration to complete a cluster update, a link to the release notes, and information about your backup for your service VMs.
Figure 3. Update advisor report—summary report
- VxRail Components shows which components need to be updated to get to the target VxRail version. The table includes the current version and target version for each component.
Figure 4. Update advisor report—components report
- VxRail Precheck is the previously mentioned pre-update health check report, inclusive of all the enhancements discussed.
Figure 5. Update advisor report—LCM precheck report
- VxRail Custom Components is a report that highlights user-managed components installed on the cluster. You should consider these custom components when deciding whether to schedule a cluster update.
Figure 6. Update advisor report—custom components report
When VxRail Manager is connected to the Dell cloud, it automatically scans for new update paths. Once a new update path is detected, VxRail Manager downloads a lightweight manifest file that contains all the information needed to produce the update advisor report. The report is automatically generated every 24 hours. This feature is designed to streamline the availability of up-to-date critical insights to help you make an informed decision about a cluster update.
Serviceability
The last set of LCM enhancements that I will cover is around serviceability. While many of the features discussed earlier are meant to be proactive and to prevent failures, there are times when failures can still occur. Being able to efficiently troubleshoot the issues is critically important to getting your clusters back up and running quickly.
In VxRail 7.0.410, the logging capability was enhanced in a couple of areas so that the Dell Support team can pinpoint issues faster. When a pre-update health check identifies failures, the offending host is now recorded. If a node does fail to enter maintenance mode, the logs now capture the reason for the failure.
In VxRail 7.0.450, we automated the backup of the VxRail Manager VM and vCenter Server VM (if it’s VxRail managed). Now you can easily back up your service VMs before updating a cluster.
Figure 7. Automate VxRail backup of service VMs before a cluster update
This feature is also integrated into the update advisor report, where you can see the latest backup on the report summary and click a link to go to the backup page to create another backup.
Value of VxRail life cycle management
If life cycle management is one of the major reasons that you chose to invest in VxRail, our continuous improvements to life cycle management should be a compelling reason to keep your clusters running the latest software. VxRail life cycle management continues to provide significant value by addressing the challenges that your organization faces today.
Figure 8. VxRail benefits (data from "The Business Value of Dell VxRail HCI," April 2023, IDC)
In an IDC study sponsored by Dell Technologies, The Business Value of Dell VxRail HCI, the value that VxRail LCM provides to organizations is significant and compelling. The results of this study are major proof points on why you should continue investing in VxRail to mitigate these challenges:
- Overburdened IT staff. The automated LCM and mechanisms in VxRail to maintain cluster integrity throughout the life of the cluster drives significant efficiencies in your IT infrastructure team.
- Unplanned outages that lead to significant disruption to businesses. The benefit of pretested and prevalidated sets of drivers, firmware, and software which we call VxRail Continuously Validated States is the significant reduction in risk as you update your HCI cluster from one version to the next.
- More time spent on deploying infrastructure and resulting slowdown of pace at which your business can innovate. The automation and integrated validation checks speeds up deployment times without compromising security.
Conclusion
The emphasis that we put on improving your LCM experience is extraordinary, and we encourage you to maximize your investment in VxRail. Updating to the latest VxRail software release gives you access to the many LCM enhancements that can drive greater efficiencies in your organization. And with VxRail Continuously Validated States, you can safely get to the next software release and the ones that follow.
Resources
For more information about the features in VxRail 7.0.400, check out this blog post:
https://infohub.delltechnologies.com/p/learn-about-the-latest-major-vxrail-software-release-vxrail-7-0-400/
For more information about the features in VxRail 7.0.450, see this post:
https://infohub.delltechnologies.com/p/learn-about-the-latest-major-vxrail-software-release-vxrail-7-0-450/
If you want to learn about the latest in the VxRail portfolio, you can check the VxRail page on the Dell Technologies website:
https://www.dell.com/en-us/dt/converged-infrastructure/vxrail/index.htm
Author: Daniel Chiu, VxRail Technical Marketing
https://www.linkedin.com/in/daniel-chiu-8422287/