Easing Life Cycle Management with VxRail
Thu, 13 Oct 2022 22:47:20 -0000|
Read Time: 0 minutes
This is the second article in a series introducing VxRail concepts.
I mentioned in the introduction blog that I previously worked in Technical Support for Dell. That experience really set the stage for me to embrace VxRail because the VxRail approach to life cycle management eases a lot of the pain points I saw in support engagements. Many of the issues I saw were resolved with system updates, and VxRail makes moving through the life cycle significantly easier than with traditional hardware or an internally built solution. We do this with our state management model, known as Continuously Validated States. Let’s take some time to understand what these are, because they help enable VxRail customers to do more with their infrastructure more easily than before.
Defining a state
I’m someone who likes to be thorough, so if you already understand what a system state is, then you can skip this section. But for readers newer to infrastructure, this might be a different way to think about things. A system state, be it good or bad, refers to the hardware, firmware, drivers, and system software that power the infrastructure. When your servers or clusters are in a “good” or “happy” state, then everything is working optimally. A “bad” or “faulty” state might have a compatibility issue creating crashes, or it might contain failed hardware. Replacing failed hardware is an example of modifying the hardware state. Modifying the software state might look like an update to VMware software. All these changes then represent new individual states.
VxRail takes the chaos out of traditional state management for customers and replaces it with confidence. VxRail Continuously Validated States make the exchange from chaos to confidence possible. Updating a cluster, such as to a new vCenter version, means changing a cluster, and that change introduces uncertainty. That uncertainty is natural because customers are moving their infrastructure into new unknown configurations.
Let’s discuss the “Validated” portion of Continuously Validated States. VxRail engineering validates the current state, the state you intend to go to, and the continuity through the update cycle. Customers can gain tremendous value by relying on VxRail Engineering to validate all three aspects of an upgrade. This is the “Validated” part of Continuously Validated States that completely inverts the experience I got used to while working in Technical Support.
Moving to a new state
When you make a change, such as adding a driver or updating system software, you are modifying the system state. Making changes to system states has always been a problem with different remediation strategies that have revealed new IT challenges. I believe the challenge that Continuously Validated States best addresses can be described as, “I need my infrastructure to help me respond to new business needs and make moving through the life cycle as easy as possible.” Modifying an HCI cluster designed internally would present additional difficulties because you don’t know what kind of behavior to expect without testing.
This kind of change anxiety is what the validation process in our state-creation process aims to correct. Before the VxRail Engineering team releases a new VxRail update package—a package that would change your cluster’s system state, the package is tested in the team’s dedicated testing facility for nearly 800,000 cumulative hours. The facility has comprehensive access to the hardware that VxRail supports, allowing thorough testing. The purpose of this testing is to first ensure that all the new supported configurations are stable and then ensure that the move from old cluster states to the new states is a reliable process.
The creation of a series of known-good configurations isn’t the only benefit VxRail can provide with this different approach to state management. Let’s talk about the continuity that Continuously Validated States provide. VxRail clusters spend their entire lives conforming with and moving between different configurations supported and defined by the Continuously Validated State. This creates a continuity that begins from the time a cluster is first unloaded from the truck, persists through the changes of both the update cycle and hardware modification, and continues on to the final point of cluster retirement.
Let’s tie these ideas together. I like to think of Continuously Validated States as being like a GPS that helps avoid road construction during a cluster’s life. VxRail can do this because our engineering teams are building the roads and identifying the best routes. Go ahead and imagine a map for me. I like to imagine a map of my home state. No matter what kind of map, it’s going to have a bunch of points and show you how to move from one point to another. Continuously Validated States serve a similar role for your clusters. Much like the points on your map, each of these states verifies new hardware and software versions for customers to move their clusters to. These states serve another role like that of a GPS—they help identify the ideal paths between states and help clusters efficiently move between them. As you might have guessed, the Continuously Validated States model isn’t simple cartography. This ideal path is identified through hundreds of thousands of testing hours performed by VxRail Engineering team members in a massive million-dollar lab environment. Those movement paths, in combination with software tooling in the update process, create continuity for clusters as they move between states and proceed through their life cycles.
Hopefully, this blog has helped distinguish how Continuously Validated States change configuration management for the better. Changing the configuration state of production clusters is an anxiety-generating action that VxRail eases by creating, testing, and validating known-good configuration states for customers. The result is that customers can update their equipment with more confidence than ever and spend more IT resources focused on enabling business projects than on performing maintenance tasks. Mike Athanasiou, a colleague of mine, did a fantastic job with our Interactive Journey video series. In the videos, Mike shows how the use of Continuously Validated States enhances different areas of cluster management. I found the videos helpful in better understanding VxRail.
The next entry in this blog series will address the advantage that VxRail offers in the update process.
Related Blog Posts
Learn About the Latest Major VxRail Software Release: VxRail 7.0.400
Thu, 22 Sep 2022 13:11:44 -0000|
Read Time: 0 minutes
As many parts of the world welcome the fall season and the cooler temperatures that it brings, one area that has not cooled down is VxRail. The latest VxRail software release, 7.0.400, introduces a slew of new features that will surely fire up our VxRail customers and spur them to schedule their next cluster update.
VxRail 7.0.400 provides support for VMware ESXi 7.0 Update 3g and VMware vCenter Server 7.0 Update 3g. All existing platforms that support VxRail 7.0 can upgrade to VxRail 7.0.400. Upgrades from VxRail 4.5 and 4.7 are supported, which is an important consideration because standard support from Dell for those versions ends on September 30.
VxRail 7.0.400 software introduces features in the following areas:
- Life cycle management
- Dynamic nodes
- Configuration flexibility
This blog delves into major enhancements in those areas. For a more comprehensive rundown of the features added to this release, see the release notes.
Life cycle management
Because life cycle management is a key area of value differentiation for our VxRail customers, the VxRail team is continuously looking for ways to further enhance the life cycle management experience. One aspect that has come into recent focus is handling cluster update failures caused by VxRail nodes failing to enter maintenance mode.
During a cluster update, nodes are put into maintenance mode one at time. Their workloads are moved onto the remaining nodes in the cluster to maintain availability while the nodes go through software, firmware, and driver updates. VxRail 7.0.350 introduced capabilities to notify users of situations such as host pinning and mounted VM tools on the host that can cause nodes to fail to enter maintenance mode, so users can address those situations before initiating a cluster update.
VxRail 7.0.400 addresses this cluster update failure scenario even further by being smarter with how it handles this issue once the cluster update is in operation. If a node fails to enter maintenance mode, VxRail automatically skips that node and moves onto the next node. Previously, this scenario would cause the cluster update operation to fail. Now, users can run that cluster update and process as many nodes as possible. Users can then run a cluster update retry, which targets only the nodes that were skipped. The combination of skipping nodes and targeted retry of those skipped nodes significantly improves the cluster update experience.
Figure 1: Addressing nodes failing to enter maintenance mode
In VxRail 7.0.400, a Dell RecoverPoint for VMs compatibility check has been added to the update advisory report, cluster update pre-check, and cluster update operation to inform users of a potential incompatibility scenario. Having data protection in an unsupported state puts an environment at risk. The addition of the compatibility check is a great news for RecoverPoint for VMs users because this previously manual task is now automated, helping to reduce risk and streamline operations.
VxRail dynamic nodes
Since the introduction of VxRail dynamic nodes last year, we’ve incrementally added more storage protocol support for increased flexibility. NFS, CIFS, and iSCSI support were added earlier this year. In VxRail 7.0.400, users can configure their VxRail dynamic nodes with storage from Dell PowerStore using NVMe on Fabric over TCP (NVMe-oF/TCP). NVMe provides much faster data access compared to SATA and SAS. The support requires Dell PowerStoreOS 2.1 or later and Dell PowerSwitch with the virtual Dell SmartFabric Storage Service appliance.
VxRail cluster deployment using NVMe-oF/TCP is not much different from setting up iSCSI storage as the primary datastore for VxRail dynamic node clusters. The cluster must go through the Day 1 bring-up activities to establish IP connectivity. From there, the user can then set up the port group, VM kernels, and NVMe-oF/TCP adapter to access the storage shared from the PowerStore.
Setting up NVMe-oF/TCP between the VxRail dynamic node cluster and PowerStore is separate from the cluster deployment activities. You can find more information about deploying NVMe-oF/TCP here: https://infohub.delltechnologies.com/t/smartfabric-storage-software-deployment-guide/.
VxRail 7.0.400 also adds VMware Virtual Volumes (vVols) support for VxRail dynamic nodes. Cluster deployment with vVols over Fibre Channel follows a workflow similar to cluster deployment with a VMFS datastore. Provisioning and zoning of the Virtual Volume needs to be done before the Day 1 bring-up. The VxRail Manager VM is installed onto the datastore as part of the Day 1 bring-up.
For vVols over IP, the Day 1 bring-up needs to be completed first to establish IP connectivity. Then the Virtual Volume can be mounted and a datastore can be created from it for the VxRail Manager VM.
Figure 2: Workflow to set up VxRail dynamic node clusters with VMware Virtual Volumes
VxRail 7.0.400 introduces the option for customers to deploy a local VxRail managed vCenter Server with their VxRail dynamic node cluster. The Day 1 bring-up installs a vCenter Server onto the cluster with a 60-day evaluation license, but the customer is required to purchase their own vCenter Server license. VxRail customers are accustomed to having a Standard edition vCenter Server license packaged with their VxRail purchase. However, that vCenter Server license is bundled with the VMware vSAN license, not the VMware vSphere license.
VxRail 7.0.400 supports the use of Dell PowerPath/VE with VxRail dynamic nodes, which is important to many storage customers who have been relying on PowerPath software for multipathing capabilities. With VxRail 7.0.400, VxRail dynamic nodes can use PowerPath with PowerStore, PowerMax, or Unity XT storage array via NFS, iSCSI, or NVMe over Fibre Channel storage protocol.
Another topic that continues to burn bright, no matter the season, is security. As threats continue to evolve, it’s important to continue to advance security measures for the infrastructure. VxRail 7.0.400 introduces capabilities that make it even easier for customers to further protect their clusters.
While the security configuration rules set forth by the Security Technical Implementation Guide (STIG) are required for customers working in or with the U.S. federal government and Department of Defense, other customers can benefit from hardening their own clusters. VxRail 7.0.400 automatically applies a subset of the STIG rules on all VxRail clusters. These rules protect VM controls and the underlying SUSE Linux operating system controls. Application of the rules occurs without any user intervention upon an upgrade to VxRail 7.0.400 and at the cluster deployment with this software version, providing a seamless experience. This feature increases the security baseline for all VxRail clusters starting with VxRail 7.0.400.
Digital certificates are used to verify the external communication between trusted entities. VxRail customers have two options for digital certificates. Self-signed certificates use the VxRail as the certificate authority to sign the certificate. Customers use this option if they don’t need a Certificate Authority or choose not to pay for the service. Otherwise, customers can import a certificate signed by a Certificate Authority to the VxRail Manager. Both options require certificates to be shared between the VxRail Manager and vCenter Server for secure communication to manage the cluster.
Previously, both options required manual intervention, at varying levels, to manage certificate renewals and ensure uninterrupted communication between the VxRail Manager and the vCenter Server. Loss of communication can affect cluster management operations, though not the application workloads.
Figure 3: Workflow for managing certificates
With VxRail 7.0.400, all areas of managing certificates have been simplified to make it easier and safer to import and manage certificates over time. Now, VxRail certificates can be imported via the VxRail Manager and API. There’s an API to import the vCenter certificate into the VxRail trust store. Renewals can be managed automatically via the VxRail Manager so that customers do not need to constantly check expiring certificates and replace certificates. Alternatively, new API calls have been created to perform these activities. While these features simplify the experience for customers already using certificates, hopefully the simplified certificate management will encourage more customers to use it to further secure their environment.
VxRail 7.0.400 also introduces end-to-end upgrade bundle integrity check. This feature has been added to the pre-update health check and the cluster update operation. The signing certificate is verified to ensure the validity of the root certificate authority. The digital certificate is verified. The bundle manifest is also checked to ensure that the contents in the bundle have not been altered.
With any major VxRail software release comes enhancements in configuration flexibility. VxRail 7.0.400 provides more flexibility for base networking and more flexibility in using and managing satellite nodes.
Previous VxRail software releases introduced long-awaited support for dynamic link aggregation for vSAN and vSphere vMotion traffic and support for two vSphere Distributed Switches (VDS) to separate traffic management traffic from vSAN and vMotion traffic. VxRail 7.0.400 removes the previous port count restriction of four ports for base networking. Customers can now also deploy clusters with six or eight ports for base networking while employing link aggregation or multiple VDS, or both.
Figure 4: Two VDS with six NIC ports
Figure 5: Two VDS with eight NIC ports with link redundancy for vMotion traffic and link aggregation for vSAN traffic
With VxRail 7.0.400, customers can convert their vSphere Standard Switch on their satellite nodes to a customer-managed VDS after deployment. This support allows customers to more easily manage their VDS and satellite nodes at scale.
The most noteworthy serviceability enhancement I want to mention is the ability to create service tickets from the VxRail Manager UI. This functionality makes it easier for customers to submit service tickets, which can speed resolution time and improve the feedback loop for providing product improvement suggestions. This feature requires an active connection with the Embedded Service Enabler to Dell Support Services. Customers can submit up to five attachments to support a service ticket.
Figure 6: Input form to create a service request
VxRail 7.0.400 is no doubt one of the more feature-heavy VxRail software releases in some time. Customers big and small will find value in the capability set. This software release enhances existing features while also introducing new tools that further focus on VxRail operational simplicity. While this blog covers the highlights of this release, I recommend that you review the release notes to further understand all the capabilities in VxRail 7.0.400.
Enhancing Satellite Node Management at Scale
Tue, 15 Mar 2022 20:30:40 -0000|
Read Time: 0 minutes
Satellite nodes are a great addition to the VxRail portfolio, empowering users at the edge, as described in David Glynn’s blog Satellite Nodes: Because sometimes even a 2-node cluster is too much. Although satellite nodes are still new, we’ve been working hard and have already started making improvements. Dell’s latest VxRail 7.0.350 release has a number of new VxRail enhancements and in this blog we’ll focus on these new satellite node features:
- Improved life cycle management (LCM)
- New APIs
- Improved security
The first way we’ve improved satellite nodes is by reducing the required maintenance window. To do this, the satellite node update process has now been split in two. Instead of staging the recovery bundle and performing the update in one step, you can now stage the recovery bundle and perform the update separately.
Staging the bundle in advance is great because we know bandwidth can be limited at the edge and this allows ample time to transfer the bundle in advance to ensure your update happens during your scheduled maintenance window. Once your bundles are staged, it’s as simple as scheduling the updates and letting VxRail execute the node update. This improvement ensures that you can complete the update within the expected timeframe to minimize downtime. Satellite nodes sit outside the cluster and, as a result, workloads will go offline while the node is updated.
Do you have a large number of edge locations that could use satellite nodes and need an easier way to manage at scale? Good news! These new APIs are perfect for making edge life at scale easier.
The new APIs include:
- Satellite node LCM
- Add a satellite node to a managed folder
- Remove a satellite node from a managed folder
The introductory release of VxRail satellite nodes featured LCM operations through the VxRail Manager plug-in, which could be quite time consuming if you are managing a large number of satellite nodes. We saw room for improvement so now administrators can use VxRail APIs to add, update, and remove satellite nodes to simplify and speed up operations.
You can use the satellite node LCM API to adjust configuration settings that benefit management at scale, such as adjusting the number of satellite nodes you want to update in parallel. For example, although the default is to update 20 nodes in parallel, you can initiate updates for up to 30 satellite nodes in parallel, as needed.
There is also a failure rate feature that will set a condition to exit from an LCM operation. For example, if you are updating multiple satellite nodes at one time and nodes are failing to update, the failure rate setting is a way to abort the operation altogether if the rate surpasses a set threshold. The default threshold is 20% but can be set anywhere from 1% to 100%. Using the VxRail API, you can adjust settings like this that are not available in the VxRail Manager.
These new APIs are great for users with a large number of VxRail satellite nodes. Adding, removing, and updating satellite nodes can now be automated through the new APIs, saving you precious time across your edge locations.
VxRail satellite nodes can now use Secure Enterprise Key Management (SEKM), made available through the Dell PowerEdge servers that VxRail is built on. What is SEKM you might ask? Well, SEKM gives you the ability to secure drive access using encryption keys stored on a central key management server (not on the satellite node).
SEKM is great for many reasons. First, an edge location might be more exposed and have less physical security than your typical data center but that doesn’t mean securing your data is any less important. SEKM keeps your data drives locked even if the entire server is stolen. When paired with self-encrypting drives, you can secure the data even further. Second, the encryption keys are stored in a centralized location, making it easier to manage the security of large numbers of satellite nodes instead of having to manage each satellite node individually.
In this blog we’ve highlighted some exciting new satellite node features, including an improved update process, new APIs, and enhanced security, all of which enhance managing the edge at scale. Check out the full VxRail 7.0.350 release and see the full list of enhancements by clicking the link below.
Thanks for reading!
Author: Stephen Graham, VxRail Tech Marketing