Scaling Up VxRail: Managing an Ecosystem
Tue, 08 Nov 2022 20:13:27 -0000|
Read Time: 0 minutes
This is the sixth article in a series introducing VxRail concepts.
The engineering team behind VxRail has done a fantastic job building cluster and life cycle management tools into our software. The cluster update process is an excellent example of one of these software enhancements. However, we need to go further. The value of these enhancements decays a bit as you have more and more clusters, resulting in more and more actions required to manage an environment. This end result is antithetical to the entire idea behind VxRail. However, this complexity reintroduction never occurs, thanks to the features and functionality of the VxRail API and CloudIQ. The API scales out management operations by providing access to many of the same software calls that VxRail makes. Then we have CloudIQ. CloudIQ is a cloud-based management utility that can interface with various Dell infrastructures that VxRail uses to improve cluster management as environments scale out.
Expanding and automating your VxRail environment
For readers that aren’t familiar with what APIs are, the acronym stands for “Application Programming Interface.” APIs exist to help two, or sometimes more, pieces of software communicate with each other. VxRail has its own API that works in conjunction with VMware APIs and the Redfish API for the iDRAC and hardware. This enables the management of hardware and both VMware and VxRail software at scale. The VxRail API Guide shows the full range of calls available to developers. There are dozens of them; the last number I saw was over 70 individual calls. Now, there’s more to the API than its comprehensive nature. It also brings with it the simplicity of use. The API can be taken advantage of using the Swagger web interface and a PowerShell module to provide simple command line interfaces that IT staff are familiar with.
The API can help customers of any size, but who I see that benefits the most from using an API is a large customer that might have tens to hundreds of nodes in many clusters. The scale of these environments creates a need for further automation that can link VxRail clusters with management tools and practices. Some use-case examples include items like node discovery to see what various hardware is available and the versions running on that hardware; another example would be something like examining node and cluster health throughout the data center. The API can also enable infrastructure-as-code projects, such as automatically spinning up and winding down clusters as needed. Even automating simple tasks, like the shutdown of clusters in a way that maintains data consistency, provides a massive value to VxRail customers.
CloudIQ: Helping Manage Your Ecosystem
VxRail has more than the API to aid in managing large environments. As great as the API is, it takes a bit of preparation to use, whereas CloudIQ is ready for use as soon as Secure Connect Gateway is enabled and clusters are enrolled. If you haven’t heard of CloudIQ, I recommend checking out the CloudIQ simulator. The simulator doesn’t provide access to the complete feature set of CloudIQ but makes for an excellent introduction to what the product can do.
CloudIQ is a cloud-based application that monitors and resolves problems with Dell storage, server, data protection, networking, HCI, and CI products, and APEX services. You might see CloudIQ referred to as an AIOps application. This is short for artificial intelligence for IT operations. In the case of VxRail, this data is sent in by customers’ clusters using Secure Connect Gateway, where CloudIQ can then perform analytics functions. The output of this analytics can be used to create custom reports, create various estimates on storage utilization, reduce IT risk, and recover from problems faster. Beginning in May and continuing into June, Dell ran a survey of CloudIQ users. These users were able to accelerate IT recovery as little as 2x to as much as 10x faster, which saved them about an entire workday per week, on average. CloudIQ provides all this to customers with no financial or IT overhead due to it being freely available for use by Dell customers connecting to the Dell cloud.
Growth is exciting, but it comes with new challenges, and old ones don’t go away—they get bigger. VxRail provides customers with an API designed to work with the iDRAC and VMware APIs to provide automation throughout the entire cluster stack. This helps customers reduce repetitive labor tasks and create infrastructure-as-code projects. Then with CloudIQ, IT staff can get a view of their Dell infrastructure equipment from one pane of glass. For VxRail, this would include software versions, cluster health scores, the ability to initiate updates, and other functionality. While the API offers most of its value to customers with very large VxRail footprints, most all customers can also benefit from CloudIQ to view multiple clusters as well as the remainder of their Dell infrastructure equipment.
Related Blog Posts
Empowering Cloud-based Multi-cluster Management Using VxRail with CloudIQ
Fri, 18 Aug 2023 23:01:25 -0000|
Read Time: 0 minutes
In today's digital landscape, organizations across various industries are generating and accumulating amazing sums of data. To harness the potential of this explosive data growth, businesses heavily rely on cluster computing. Managing these clusters effectively is critical to optimizing performance and ensuring continuous operations. VxRail clusters provide massive amounts of automation right out-of-the-box, which helps administrators accomplish significantly more.
But as the number of clusters grows, a centralized management interface becomes more and more valuable. That’s why I wanted to talk to you about CloudIQ today and introduce three exciting new features:
- Support for 2-node and stretched vSAN clusters
- DPU monitoring
- Performance anomaly detection with historical seasonality
These advancements revolutionize cluster management because they offer enhanced efficiency, flexibility, and performance to meet the evolving needs of modern enterprises.
The evolution of cloud-based cluster management
Traditional on-premises cluster management frequently presents challenges with hardware maintenance, scalability issues, and costly infrastructure investments. Cluster management with CloudIQ has proved to be a game-changer, allowing businesses to centralize the management of hardware and infrastructure to a single cloud-based tool.
By combining VxRail automation with CloudIQ, enterprises can focus on optimizing their applications and workflows while more easily handling cluster provisioning, scaling, and maintenance. This paradigm shift not only improves resource allocation and utilization. It also enables organizations to adapt more quickly to dynamic workloads.
2-Node and stretched vSAN cluster support for CloudIQ
In response to diverse business needs, CloudIQ now supports two additional cluster deployment types: the 2-node and stretched vSAN clusters.
Traditionally, clusters required a minimum of three nodes to maintain high availability, because having an odd number of nodes helped avoid split-brain scenarios. However, 2-node clusters can also address this challenge and ensure fault tolerance and high availability.
2-node clusters use advanced quorum mechanisms, allowing them to make decisions efficiently despite the lack of a third node. The nodes in the cluster communicate with each other and decide on quorum, based on various factors like network connectivity, storage health, and other cluster components. This setup significantly reduces infrastructure costs and is ideal for small to medium-sized businesses that require robust cluster management without the expense of additional nodes. 2-node clusters populate in the same location in CloudIQ as the rest of your clusters. They can be found by selecting the Systems option under the Monitor tab. After you select Systems, select the HCI inventory option and your enrolled VxRail clusters will populate there.
Stretched vSAN clusters
Businesses often want to deploy clusters across multiple geographically distributed data centers to improve disaster recovery and enhance business continuity. Stretched VxRail clusters with vSAN provide an excellent solution by extending vSAN technology across multiple data centers.
Key benefits of stretched VxRail clusters with vSAN include:
- Disaster Recovery: By replicating data between data centers, these clusters protect against site-wide outages, ensuring that operations continue seamlessly in case of a data center failure.
- Load Balancing: Stretched clusters intelligently distribute workloads across different data centers, based on demand, to optimize resource utilization and performance.
- Data Locality: Organizations can maintain data locality to comply with regional data regulations and reduce data access latency for end-users across different geographical regions.
Data Processing Unit Reporting
In a clustered environment, data processing units (DPUs) can become critical for efficient resource management. DPUs are hardware accelerators designed to handle specific data processing tasks, like NSX and other tasks handled by the vSphere Distributed Services Engine, to enhance overall cluster performance for specific workloads.
The Data Processing Unit Reporting feature provides insight into the details of DPUs within the cluster. Cluster administrators can view the hardware information for each DPU, including: the host name of each server with a DPU, the model, the OS version running on the host, the slot the DPU is installed in, each DPU’s serial number, and their manufacturer.
Performance Anomaly Detection
Unanticipated performance fluctuations can significantly impact application responsiveness and overall user experience. To address this concern, CloudIQ now integrates Performance Anomaly Detection—an intelligent monitoring feature that proactively identifies performance issues as they develop.
How does Performance Anomaly Detection work?
This feature uses machine learning algorithms to establish baseline performance patterns for various cluster metrics, including CPU utilization, memory utilization, power consumption, and networking.
When configured, the system continuously monitors real-time performance metrics and compares them to the baseline.
When CloudIQ detects any deviations from the expected behavior, it can raise alerts, enabling administrators to investigate and rectify potential problems immediately. This proactive approach ensures that performance issues are addressed before they affect critical operations, reducing downtime and enhancing user satisfaction.
As the demand for efficient data processing and storage continues to grow, cloud-based cluster management becomes vital for modern enterprises. The introduction of 2-node and stretched vSAN cluster support, data processing unit reporting, and performance anomaly detection takes cluster management to new heights. By leveraging the cutting-edge features of CloudIQ with VxRail, business organizations can unlock unparalleled efficiency, scalability, and performance, gaining a competitive advantage in today's fast-paced digital landscape. Embracing cloud-based cluster management with CloudIQ and its new features will undoubtedly pave the way for a bright and productive future for organizations and industries of all sizes.
Author: Dylan Jackson
Optimize and Streamline Data Center Operations with VxRail
Fri, 31 Mar 2023 19:07:36 -0000|
Read Time: 0 minutes
Winter will soon be coming to an end, and Spring will be right behind. More outdoor life will resume as the season of refreshment and renewal approaches. In that spirit, I want to reseed the VxRail field of knowledge. One of the most critical areas of improvement that VxRail offers is in lifecycle management; just like how an attentive gardener can improve a plant’s life and fruit output, so too can VxRail improve and enhance the lifecycle management of clusters. We’ll take a look at how VxRail accomplishes this objective within this blog and the linked videos.
VxRail Manager is an on-cluster virtual machine that makes cluster management more accessible than ever through VMware vCenter. Much like how the tractor transformed harvesting wheat, VxRail Manager transforms how we manage clusters. Both cases enable people to accomplish much more. This virtual machine enables VxRail functionality in customer environments, providing automation solutions built by our VxRail team and support for customer-built scripts through an API. We’ll discuss some of these automation features later in this blog. These features are update prechecks, cluster update cycles, compliance checks, and cluster expansion. These functions, as well as many, many others, are also available by API call, which enables customers to craft their own automation solutions in addition to what VxRail already provides. The automation features that VxRail comes included with, such as a cluster shutdown, are readily consumed within vCenter, eliminating the need for administrators to familiarize themselves with yet another interface.
Enhanced update prechecks
The first significant part of the VxRail HCI System Software we’ll cover in this blog is update prechecks. The VxRail update precheck massively assists administrators by helping to confirm that a cluster is healthy enough to upgrade. This precheck script performs hundreds of tasks, but, as an example, let’s focus on just a couple of them.
Updates can halt for a variety of reasons. For example, if a cluster has insufficient storage, the whole cluster will fail to update. If a node was left in maintenance mode, it would not update. By running the optional VxRail update precheck, these headaches can be avoided. To get a look at the precheck, check out the video to the right. Just like how a tractor will till a field to prepare it for more successful seeding, the precheck also helps prepare clusters for more successful upgrades. VxRail Manager identifies issues like storage availability with this check, but it doesn’t stop there. VxRail Manager then provides administrators with a solution recommendation through a link to a Dell Knowledge Base article. Running the precheck takes a fraction of the time an update requires and has the potential to save massive amounts of labor and time by avoiding problems both large and small. So, while the precheck is optional, I’d recommend running it before every update cycle. The ability to prevent most issues from taking root is one everyone can appreciate.
Prechecks aren’t the only improvement, however. Now that VxRail Manager has confirmed that our clusters are healthy, we would want to apply an update. This is where we discuss some of the most critical enhancements a VxRail solution provides.
Updating a traditional non VxRail cluster is a process with a significant number of steps. Administrators need to create baseline images for each cluster and add any drivers, firmware, or other vendor add-ons each time. The difference in VxRail comes with implementing what we’ve come to call “Continuously Validated States.” Continuously Validated States make the LCM experience so much better. This is done in several ways, the first of which is that it removes the need to create baselines and cluster profiles. A video overview of the update process is here to the left, but Continuously Validated States handles this process. The VxRail team assembles the majority of the components needed in an update cycle, requiring only devices like GPUs, Fibre Channel controllers, and additional VMware software components, like NSX-T, to be added to the update packages. This helps administrators save time and effort, but it also provides massive long-term stability benefits. VxRail doesn’t just throw a bunch of component updates into a zip file and call it a day. These update packages undergo extensive testing in a multimillion-dollar VxRail testing facility for over 100,000 hours to help clusters reach as high as 99.9999 percent uptime in some cases. Administrators can then use the update packages created by the VxRail team to move their clusters from one Continuously Validated State to the next, keeping their VxRail environment in a known-good, happy state as it moves through the cluster lifecycle.
The update packages created by VxRail can be used in two different ways. Clusters with Internet access can pull these files through the network using the Internet Update button in the UI. Air-gapped clusters can move between Continuously Validated States through the Offline Update option, which uses local bundle files downloaded from the Dell support site. Returning to our gardening and cultivation analogies, you can certainly till a plot with a garden hoe, but you can do much more much easier with a rotary tiller. Similarly, administrators can manage a DIY HCI environment manually, but it will be significantly more difficult.
If you’d like to know about the new VxRail 8.0.000 release, you can read an article on it on the Info Hub Blog page.
Confirming the cluster state
Update automation ensures that VxRail clusters remain in a Continuously Validated State during update cycles, but clusters spend most of their lives completing business operations. VxRail Manager includes a compliance checker tool that helps ensure that clusters continue to adhere to their Continuously Validated State and identify any installed versions that drift away from it. This check is available on demand within the update interface of the UI. The compliance checker tool examines each node in a cluster, collects the current version set, and then displays the output in a way that calls out problem items specifically. When you consider that a significant part of the value that VxRail brings to the table is based on using and adhering to Continuously Validated States, it’s easy to see how helpful an automated inspection tool can be.
Easier and faster growth
Moving along, VxRail also enhances cluster growth. VxRail provides heterogeneous hardware support, which means customers can add different nodes as needed to best address their resource demands. As an example, this could be like adding different node models or even different hardware generations to a cluster. There are software improvements as well. When an administrator goes to add their new node or nodes, VxRail Manager scans the hosts available on the network for their software and firmware versions and confirm compatibility with the rest of the cluster. You can view the check in the video to the right, but it’s powered by the same concept behind VxRail updates, that being Continuously Validated States and the certainty their use provides. Clusters aren’t static environments—updates happen, and different versions in the environment could make compatibility checks a new hurdle! VxRail alleviates this pinch point with the use of the node image management tool. This tool, also known as NIM, is much like an enhanced Rapid Appliance Self-Recovery, or RASR, operation. Where a RASR will reimage a particular node with several steps, NIM allows for multiple node reimaging operations in parallel. The tool and instructions for its use can be found with SolVe online procedures.
Managing an HCI environment takes a lot of work with many tasks to accomplish. VxRail automates many of these tasks, particularly when it comes to lifecycle management. The automation enhancements don’t stop with what has been engineered into the software. VxRail also features an API that exposes the same calls and features used in the UI for script automation. Our API features over 100 calls, providing the toolkit necessary to create all kinds of special-purpose automation solutions. If VxRail is our tractor, the API toolkit helps customers create their own cultivators, levellers, hay bailers, and more! The precheck, updates, and compliance check enhancements we talked about earlier in this blog are all examples of actions that the API can take. As environments grow larger, management options need to grow with them. The VxRail API offers an expansive toolkit for developers to build the tools that businesses need.
Enhancing cluster lifecycle management and enabling administrators to manage more infrastructure are core parts of the VxRail offering. We covered how VxRail improves lifecycle management and empowers administrators with the update precheck that ensures clusters are ready to upgrade safely. We also covered the improved cluster update process with reduced labor time, compliance checks that ensure that cluster nodes adhere to their Continuously Validated States between upgrade cycles, and the cluster expansion tool with image control. The VxRail API can interface with these processes, enabling custom automation solutions within customer data centers. Just like how the humble tractor provided productivity that was previously unimaginable, VxRail provides the automation to maintain data centers more efficiently than ever. For more information about VxRail, visit the VxRail Info Hub.
Author: Dylan Jackson, Engineering Technologist