Optimize and Streamline Data Center Operations with VxRail
Fri, 31 Mar 2023 19:07:36 -0000|
Read Time: 0 minutes
Winter will soon be coming to an end, and Spring will be right behind. More outdoor life will resume as the season of refreshment and renewal approaches. In that spirit, I want to reseed the VxRail field of knowledge. One of the most critical areas of improvement that VxRail offers is in lifecycle management; just like how an attentive gardener can improve a plant’s life and fruit output, so too can VxRail improve and enhance the lifecycle management of clusters. We’ll take a look at how VxRail accomplishes this objective within this blog and the linked videos.
VxRail Manager is an on-cluster virtual machine that makes cluster management more accessible than ever through VMware vCenter. Much like how the tractor transformed harvesting wheat, VxRail Manager transforms how we manage clusters. Both cases enable people to accomplish much more. This virtual machine enables VxRail functionality in customer environments, providing automation solutions built by our VxRail team and support for customer-built scripts through an API. We’ll discuss some of these automation features later in this blog. These features are update prechecks, cluster update cycles, compliance checks, and cluster expansion. These functions, as well as many, many others, are also available by API call, which enables customers to craft their own automation solutions in addition to what VxRail already provides. The automation features that VxRail comes included with, such as a cluster shutdown, are readily consumed within vCenter, eliminating the need for administrators to familiarize themselves with yet another interface.
Enhanced update prechecks
The first significant part of the VxRail HCI System Software we’ll cover in this blog is update prechecks. The VxRail update precheck massively assists administrators by helping to confirm that a cluster is healthy enough to upgrade. This precheck script performs hundreds of tasks, but, as an example, let’s focus on just a couple of them.
Updates can halt for a variety of reasons. For example, if a cluster has insufficient storage, the whole cluster will fail to update. If a node was left in maintenance mode, it would not update. By running the optional VxRail update precheck, these headaches can be avoided. To get a look at the precheck, check out the video to the right. Just like how a tractor will till a field to prepare it for more successful seeding, the precheck also helps prepare clusters for more successful upgrades. VxRail Manager identifies issues like storage availability with this check, but it doesn’t stop there. VxRail Manager then provides administrators with a solution recommendation through a link to a Dell Knowledge Base article. Running the precheck takes a fraction of the time an update requires and has the potential to save massive amounts of labor and time by avoiding problems both large and small. So, while the precheck is optional, I’d recommend running it before every update cycle. The ability to prevent most issues from taking root is one everyone can appreciate.
Prechecks aren’t the only improvement, however. Now that VxRail Manager has confirmed that our clusters are healthy, we would want to apply an update. This is where we discuss some of the most critical enhancements a VxRail solution provides.
Updating a traditional non VxRail cluster is a process with a significant number of steps. Administrators need to create baseline images for each cluster and add any drivers, firmware, or other vendor add-ons each time. The difference in VxRail comes with implementing what we’ve come to call “Continuously Validated States.” Continuously Validated States make the LCM experience so much better. This is done in several ways, the first of which is that it removes the need to create baselines and cluster profiles. A video overview of the update process is here to the left, but Continuously Validated States handles this process. The VxRail team assembles the majority of the components needed in an update cycle, requiring only devices like GPUs, Fibre Channel controllers, and additional VMware software components, like NSX-T, to be added to the update packages. This helps administrators save time and effort, but it also provides massive long-term stability benefits. VxRail doesn’t just throw a bunch of component updates into a zip file and call it a day. These update packages undergo extensive testing in a multimillion-dollar VxRail testing facility for over 100,000 hours to help clusters reach as high as 99.9999 percent uptime in some cases. Administrators can then use the update packages created by the VxRail team to move their clusters from one Continuously Validated State to the next, keeping their VxRail environment in a known-good, happy state as it moves through the cluster lifecycle.
The update packages created by VxRail can be used in two different ways. Clusters with Internet access can pull these files through the network using the Internet Update button in the UI. Air-gapped clusters can move between Continuously Validated States through the Offline Update option, which uses local bundle files downloaded from the Dell support site. Returning to our gardening and cultivation analogies, you can certainly till a plot with a garden hoe, but you can do much more much easier with a rotary tiller. Similarly, administrators can manage a DIY HCI environment manually, but it will be significantly more difficult.
If you’d like to know about the new VxRail 8.0.000 release, you can read an article on it on the Info Hub Blog page.
Confirming the cluster state
Update automation ensures that VxRail clusters remain in a Continuously Validated State during update cycles, but clusters spend most of their lives completing business operations. VxRail Manager includes a compliance checker tool that helps ensure that clusters continue to adhere to their Continuously Validated State and identify any installed versions that drift away from it. This check is available on demand within the update interface of the UI. The compliance checker tool examines each node in a cluster, collects the current version set, and then displays the output in a way that calls out problem items specifically. When you consider that a significant part of the value that VxRail brings to the table is based on using and adhering to Continuously Validated States, it’s easy to see how helpful an automated inspection tool can be.
Easier and faster growth
Moving along, VxRail also enhances cluster growth. VxRail provides heterogeneous hardware support, which means customers can add different nodes as needed to best address their resource demands. As an example, this could be like adding different node models or even different hardware generations to a cluster. There are software improvements as well. When an administrator goes to add their new node or nodes, VxRail Manager scans the hosts available on the network for their software and firmware versions and confirm compatibility with the rest of the cluster. You can view the check in the video to the right, but it’s powered by the same concept behind VxRail updates, that being Continuously Validated States and the certainty their use provides. Clusters aren’t static environments—updates happen, and different versions in the environment could make compatibility checks a new hurdle! VxRail alleviates this pinch point with the use of the node image management tool. This tool, also known as NIM, is much like an enhanced Rapid Appliance Self-Recovery, or RASR, operation. Where a RASR will reimage a particular node with several steps, NIM allows for multiple node reimaging operations in parallel. The tool and instructions for its use can be found with SolVe online procedures.
Managing an HCI environment takes a lot of work with many tasks to accomplish. VxRail automates many of these tasks, particularly when it comes to lifecycle management. The automation enhancements don’t stop with what has been engineered into the software. VxRail also features an API that exposes the same calls and features used in the UI for script automation. Our API features over 100 calls, providing the toolkit necessary to create all kinds of special-purpose automation solutions. If VxRail is our tractor, the API toolkit helps customers create their own cultivators, levellers, hay bailers, and more! The precheck, updates, and compliance check enhancements we talked about earlier in this blog are all examples of actions that the API can take. As environments grow larger, management options need to grow with them. The VxRail API offers an expansive toolkit for developers to build the tools that businesses need.
Enhancing cluster lifecycle management and enabling administrators to manage more infrastructure are core parts of the VxRail offering. We covered how VxRail improves lifecycle management and empowers administrators with the update precheck that ensures clusters are ready to upgrade safely. We also covered the improved cluster update process with reduced labor time, compliance checks that ensure that cluster nodes adhere to their Continuously Validated States between upgrade cycles, and the cluster expansion tool with image control. The VxRail API can interface with these processes, enabling custom automation solutions within customer data centers. Just like how the humble tractor provided productivity that was previously unimaginable, VxRail provides the automation to maintain data centers more efficiently than ever. For more information about VxRail, visit the VxRail Info Hub.
Author: Dylan Jackson, Engineering Technologist
Related Blog Posts
Recovering Clusters Faster: VxRail Serviceability
Tue, 08 Nov 2022 20:13:28 -0000|
Read Time: 0 minutes
This is the fifth article in a series introducing VxRail concepts.
Every tool or piece of equipment out there requires maintenance of some kind. That’s as true for the cars we drive as it is for the servers, storage, and switches that power our data centers. However, a lot of data center maintenance is reactive. Look at hardware failure as an example. If a drive were to fail in one of your clusters, nothing would happen until IT staff respond. VxRail offers the ability to automate some of these responses. Let’s talk about what happens when things go sideways in a cluster’s life.
Help Righting the Ship
One of the roles that the VxRail Manager VM fills is a centralized alert collector. VxRail integrates with the iDRAC to monitor hardware health and with vCenter to monitor VMware software, in addition to its own internal alerts and events. VxRail monitors all this information and creates a more holistic monitoring system for a cluster. This obviously benefits IT staff, but there are some additional benefits to this multi-level integration that other solutions might struggle to match.
VxRail uses a service called “Secure Connect Gateway” to establish a static connection to Dell data centers. This enables a lot of functionality on VxRail, including with CloudIQ for multi-cluster management, but that’s the subject of a future discussion. This static connection helps technical support become more proactive in helping you recover your clusters. For example, say you had a disk fail. If Secure Connect Gateway is enabled, VxRail would dial home and create a case automatically. Support could then use this to confirm the disk failure and confirm that there aren’t any other hardware or software issues being raised. Depending on what warranty services you have, you could even opt to have a replacement hard drive sent out automatically. It wasn’t uncommon for me to see support cases where we were the first to let the administrators know that there was an issue. It was definitely nice to be able to tell them a correction was already on its way out to them.
These phone homes that go through the Secure Connect Gateway add more value than helping to automate parts of some dispatches. The gateway also aids in the support experience. It can do this in a few ways, including providing an integrated support chat applet, accessible from the VxRail Support tab in vCenter. Secure Connect Gateway also facilitates the transfer of the system logs needed to troubleshoot most any problem in the VxRail stack. These logs include the VxRail Manager virtual machine logs, vCenter logs, ESXi logs, iDRAC logs, and platform logs. vCenter and ESXi logs obviously are logs specific to the software powering the cluster. The iDRAC and platform logs contain the hardware inventory, LCM activity, out-of-band hardware log, and more.
I’ve touched on a lot of topics surrounding the support experience, but there’s one more that absolutely needs to be mentioned—that’s the people in support! The technical support staff standing behind VxRail are a very talented and knowledgeable group of folks. Many of these agents are VMware Certified Professionals, some looking for higher levels of certification, like the VCIX, one of VMware’s expert level certifications. This cumulative knowledge pool allows our support team to resolve over 95% of the incidents they encounter without needing a higher-level VMware engagement. However, in the instances where a VMware engagement is needed, say that a bug is discovered with vCenter for example, then VxRail support can escalate to VMware on the end customer’s behalf. This helps to create continuity in the support experience that might be missing from a solution without the jointly engineered nature of VxRail.
Servicing clusters can become a challenge, no matter the environment. Hardware and software both encounter failures that require an IT staff response. As environments grow and scale, the challenge of maintaining health for the environment grows, too. To help meet this expanding problem, VxRail helps administrators by automatically collecting events and alerts from the hardware and both VMware and VxRail software. This information can then be compressed into log bundles that can be shared with support. Contacting support is even easier, thanks to an integrated chat connecting your host to VxRail support staff. These support staff are specialists in both VMware and VxRail software, capable of resolving a vast majority of all issues with a single vendor. Our final discussion will be on the extensibility of VxRail, featuring CloudIQ and the VxRail API. See you there!
Scaling Up VxRail: Managing an Ecosystem
Tue, 08 Nov 2022 20:13:27 -0000|
Read Time: 0 minutes
This is the sixth article in a series introducing VxRail concepts.
The engineering team behind VxRail has done a fantastic job building cluster and life cycle management tools into our software. The cluster update process is an excellent example of one of these software enhancements. However, we need to go further. The value of these enhancements decays a bit as you have more and more clusters, resulting in more and more actions required to manage an environment. This end result is antithetical to the entire idea behind VxRail. However, this complexity reintroduction never occurs, thanks to the features and functionality of the VxRail API and CloudIQ. The API scales out management operations by providing access to many of the same software calls that VxRail makes. Then we have CloudIQ. CloudIQ is a cloud-based management utility that can interface with various Dell infrastructures that VxRail uses to improve cluster management as environments scale out.
Expanding and automating your VxRail environment
For readers that aren’t familiar with what APIs are, the acronym stands for “Application Programming Interface.” APIs exist to help two, or sometimes more, pieces of software communicate with each other. VxRail has its own API that works in conjunction with VMware APIs and the Redfish API for the iDRAC and hardware. This enables the management of hardware and both VMware and VxRail software at scale. The VxRail API Guide shows the full range of calls available to developers. There are dozens of them; the last number I saw was over 70 individual calls. Now, there’s more to the API than its comprehensive nature. It also brings with it the simplicity of use. The API can be taken advantage of using the Swagger web interface and a PowerShell module to provide simple command line interfaces that IT staff are familiar with.
The API can help customers of any size, but who I see that benefits the most from using an API is a large customer that might have tens to hundreds of nodes in many clusters. The scale of these environments creates a need for further automation that can link VxRail clusters with management tools and practices. Some use-case examples include items like node discovery to see what various hardware is available and the versions running on that hardware; another example would be something like examining node and cluster health throughout the data center. The API can also enable infrastructure-as-code projects, such as automatically spinning up and winding down clusters as needed. Even automating simple tasks, like the shutdown of clusters in a way that maintains data consistency, provides a massive value to VxRail customers.
CloudIQ: Helping Manage Your Ecosystem
VxRail has more than the API to aid in managing large environments. As great as the API is, it takes a bit of preparation to use, whereas CloudIQ is ready for use as soon as Secure Connect Gateway is enabled and clusters are enrolled. If you haven’t heard of CloudIQ, I recommend checking out the CloudIQ simulator. The simulator doesn’t provide access to the complete feature set of CloudIQ but makes for an excellent introduction to what the product can do.
CloudIQ is a cloud-based application that monitors and resolves problems with Dell storage, server, data protection, networking, HCI, and CI products, and APEX services. You might see CloudIQ referred to as an AIOps application. This is short for artificial intelligence for IT operations. In the case of VxRail, this data is sent in by customers’ clusters using Secure Connect Gateway, where CloudIQ can then perform analytics functions. The output of this analytics can be used to create custom reports, create various estimates on storage utilization, reduce IT risk, and recover from problems faster. Beginning in May and continuing into June, Dell ran a survey of CloudIQ users. These users were able to accelerate IT recovery as little as 2x to as much as 10x faster, which saved them about an entire workday per week, on average. CloudIQ provides all this to customers with no financial or IT overhead due to it being freely available for use by Dell customers connecting to the Dell cloud.
Growth is exciting, but it comes with new challenges, and old ones don’t go away—they get bigger. VxRail provides customers with an API designed to work with the iDRAC and VMware APIs to provide automation throughout the entire cluster stack. This helps customers reduce repetitive labor tasks and create infrastructure-as-code projects. Then with CloudIQ, IT staff can get a view of their Dell infrastructure equipment from one pane of glass. For VxRail, this would include software versions, cluster health scores, the ability to initiate updates, and other functionality. While the API offers most of its value to customers with very large VxRail footprints, most all customers can also benefit from CloudIQ to view multiple clusters as well as the remainder of their Dell infrastructure equipment.