Exploring the customer experience with lifecycle management for vSAN Ready Nodes and VxRail clusters
Thu, 24 Sep 2020 19:41:49 -0000|
Read Time: 0 minutes
The difference between VMware vSphere LCM (vLCM) and Dell EMC VxRail LCM is still a trending topic that most HCI customers and prospects want more information about. While we compared the two methods at a high level in our previous blog post, let’s dive into the more technical aspects of the LCM operations of VMware vLCM and VxRail LCM. The detailed explanation in this blog post should give you a more complete understanding of your role as an administrator for cluster lifecycle management with vLCM versus VxRail LCM.
Even though vLCM has introduced a vast improvement in automating cluster updates, lifecycle management is more than executing cluster updates. With vLCM, lifecycle management is still very much a customer-driven endeavor. By contrast, VxRail’s overarching goal for LCM is operational simplicity, by leveraging Continuously Validated States to drive cluster LCM for the customer. This is a large part of why VxRail has over 8,600 customers since it was launched in early 2016.
In this blog post, I’ll explain the four major areas of LCM:
- Defining the initial baseline configuration
- Planning for a cluster update
- Executing the cluster update
- Sustaining cluster integrity over the long term
Defining the initial baseline configuration
The baseline configuration is a vital part of establishing a steady state for the life of your cluster. The baseline configuration is the current known good state of your HCI stack. In this configuration, all the component software and firmware versions are compatible with one another. Interoperability testing has validated full stack integrity for application performance and availability while also meeting security standards in place. This is the ‘happy’ state for you and your cluster. Any changes to the configuration use this baseline to know what needs to be rectified to return to the ‘happy’ state.
How is it done with vLCM?
vLCM depends on the hardware vendor to provide a Hardware Management Services virtual machine. Dell provides this support for its Dell EMC PowerEdge servers, including vSAN ReadyNodes. I’ll use this implementation to explain the overall process. Dell EMC vSAN ReadyNodes use the OpenManage Integration for VMware vCenter (OMIVV) plugin to connect to and register with the vCenter Server.
Once the VM is deployed and registered, you need to create a credential-based profile. This profile captures two accounts: one for the out-of-band hardware interface, the iDRAC, and the other for the root credentials for the ESXi host. Future changes to the passwords require updating the profile accordingly.
With the VM connection and profile in place, a Catalog XML file is used by vLCM to define the initial baseline configuration. To create the Catalog XML file, you need to install and configure the Dell Repository Manager (DRM) to build the hardware profile. Once a profile is defined to your specification, it must then be exported and stored on an NFS or CIFS share. The profile is then used to populate the Repository Profile data in the OMIVVV UI. If you are unsure of your configuration, refer to the vSAN Hardware Compatibility List (HCL) for the specific supported firmware versions. Once the hardware profile is created, you can then associate it with the cluster profile. With the cluster profile defined, you can enable drift detection. Any future change to the Catalog XML file is done within the DRM.
It’s important to note that vLCM was introduced in vSphere 7.0. To use vLCM, you must first update or deploy your clusters to run vSphere 7.x.
How is it done with VxRail LCM?
With VxRail, when the cluster arrives at the customer data center, it’s already running in a ‘happy’ state. For VxRail, the ‘happy’ state is called Continuously Validated States. The term is pluralized because VxRail defines all the ‘happy’ states that your cluster will update to over time. This means that your cluster is always running in a ‘happy’ state without you needing to research, define, and test to arrive at Continuously Validated States throughout the life of your cluster. VxRail – well, specifically the VxRail engineering team - does it for you. This has been a central tenet of VxRail since the product first launched with vSphere 6.0. Since then it has helped customers transition to vSphere 6.5, 6.7, and now 7.0.
Once the VxRail cluster initialization is completed, use your Dell EMC Support credentials to configure the VxRail repository setting within vCenter. VxRail Manager plugin to vCenter will automatically connect to the VxRail repository at Dell EMC and pull down the next available update package.
Figure 1 Defining the initial baseline configuration
Planning for a cluster update
Updates are a constant in IT, and VMware is constantly adding new capabilities or product/security fixes that require updating to newer versions of software. Take for example the vSphere 7.0 Update 1 release that VMware and Dell Technologies just announced. Those eye-opening features are available to you when you update to that release. You can check out just how often VMware has historically updated vSphere here: https://kb.vmware.com/s/article/2143832.
As you know, planning for a cluster update is an iterative process with inherent risk associated with it. Failing to plan diligently can cause adverse effects on your cluster, ranging from network outages and node failure to data unavailability or data loss. That said, it’s important to mitigate the risk where you can.
How is it done with vLCM?
With vLCM, the responsibility of planning for a cluster update rests on the customers’ shoulders, including the risk. Understanding the Bill of Materials that makes up your server’s hardware profile is paramount to success. Once all the components are known, and a target version of vSphere ESXi is specified, the supported driver and firmware version needs to be investigated and documented. You must consult the VMware Compatibility Guide to find out which drivers/firmware are supported for each ESXi release.
It is important to note that although vLCM gives you the toolset to apply firmware and driver updates, it does not validate compatibility or support for each combination for you, except for the HBA Driver. This task is firmly in the customer’s domain. It is advisable to validate and test the combination in a separate test environment to ensure that no performance regression or issues are introduced into the production environment. Interoperability testing can be an extensive and expensive undertaking. Customers should create and define robust testing processes to ensure that full interoperability and compatibility is met for all components managed and upgraded by vLCM.
With Dell EMC vSAN Ready Nodes, customers can rest assured that the HCL certification and compatibility validation steps have been performed. However, the customer is still responsible for interoperability testing.
How is it done with VxRail LCM?
VxRail engineering has taken a unique approach to LCM. Rather than leaving the time-consuming LCM planning to already overburdened IT departments, they have drastically reduced the risk by investing over $60 million, more than 25,000 hours of testing for major releases, and more than 100 team members into a comprehensive regression test plan. This plan is completed prior to every VxRail code release. (This is in addition to the testing and validation performed by PowerEdge, on which VxRail nodes are built.)
Dell EMC VxRail engineering performs this testing within 30 days of any new VMware release (even quicker for express patches), so that customers can continually benefit from the latest VMware software innovations and confidently address security vulnerabilities. You may have heard this called “synchronous release”.
The outcome of this effort is a single update bundle that is used to update the entire HCI stack, including the operating system, the hardware’s drivers and firmware, and management components such as VxRail Manager and vCenter. This allows VxRail to define the declarative configuration we mentioned previously (“Continuously Validated States”), allowing us to move easily from one validated state to the next with each update.
Figure 2 Planning for a cluster update
Executing the cluster update
The biggest improvement with vLCM is its ability to orchestrate and automate a full stack HCI cluster update. This simplifies the update operation and brings enormous time savings. This process is showcased in a recent study performed by Principled Technologies with PowerEdge Servers with vSphere (not including vSAN).
How is it done with vLCM?
The first step is to import the ESXi ISO via the vLCM tab in the vCenter Server UI. Once uploaded, select the relevant cluster, ensure that the cluster profile (created in the initial baseline configuration phase) is associated with the cluster being updated. Now, you can apply the target configuration by editing the ESXi image and, from the OMIVV UI, choose the correct firmware and driver to apply to the hardware profile. Once a compliance scan is complete, you will have the option to remediate all hosts.
If there are multiple homogenous clusters you need to update, it can be as easy as using the same cluster profile to execute the cluster update against. However, if the next cluster has a different hardware configuration, then you would have to perform the above steps over again. Customers with varying hardware and software requirements for their clusters will have to repeat many of these steps, including the planning tasks, to ensure stack integrity.
How it is done with VxRail LCM?
With VxRail and Continuously Validated States, updating from one configuration to another is even simpler. You can access the VxRail Manager directly within the vCenter Server UI to initiate the update. The LCM operation automatically retrieves the update bundle from the VxRail repository, runs a full stack pre-update health check, and performs the cluster update.
With VxRail, performing multi-cluster updates is as simple as performing a single-cluster update. The same LCM cluster update workflow is followed. While different hardware configurations on separate clusters will add more labor for IT staff for vSAN Ready Nodes, this doesn’t apply to VxRail. In fact, in the latest release of our SaaS multi-cluster management capability set, customers can now easily perform cluster updates at scale from our cloud-based management platform, MyVxRail.
Figure 3 Executing a cluster update
Sustaining cluster integrity over the long term
The long-term integrity of a cluster outlasts the software and hardware in it. As mentioned earlier, because new releases are made available frequently, software has a very short life span. While hardware has more staying power, it won’t outlast some of the applications running on them. New hardware platforms will emerge. New hardware devices will enter the market that will launch new workloads, such as machine learning, graphics rendering, and visualization workflows. You will need the cluster to evolve non-disruptively to deliver the application performance, availability, and diversity your end-users require.
How is it done with vLCM?
In its current form, vLCM will struggle in long-term cluster lifecycle management. In particular, its inability to support heterogeneous nodes (nodes with different hardware configurations) in the same cluster will limit its application diversification and its ability to take advantage of new hardware platforms without impacting end-users.
How it is done with VxRail LCM?
VxRail LCM touts its ability to allow customers to grow non-disruptively and to scale their clusters over time. That includes adding non-identical nodes into the clusters for new applications, adding new hardware devices for new applications or more capacity, or retiring old hardware from the cluster.
Figure 4 Comparing vSphere LCM and VxRail LCM cluster update operations driven by the customer
The VMware vLCM approach empowers customers who are looking for more configuration flexibility and control. They have the option to select their own hardware components and firmware to build the cluster profile. With this freedom comes the responsibility to define the HCI stack and make investments in equipment and personnel to ensure stack integrity. vLCM supports this customer-driven approach with improvements in cluster update execution for faster outcomes.
Dell EMC VxRail LCM continues to take a more comprehensive approach to optimize operational efficiency from the point of the view of the customer. VxRail customers value its LCM capabilities because it reduces operational time and effort which can be diverted into other areas of need in IT. VxRail takes on the responsibility to drive stack integrity for the lifecycle management of the cluster with Continuously Validated States. And VxRail sustains stack integrity throughout the life of the cluster, allowing you to simply and predictably evolve with technology trends.
Related Blog Posts
HCI Security Simplified: Protecting Dell VxRail with VMware NSX Security
Fri, 08 Apr 2022 18:14:37 -0000|
Read Time: 0 minutes
Cybersecurity and protection against ransomware attacks are among the top priorities for most customers who have successfully implemented or are going through a digital transformation. According to the ESG’s 2022 Technology Spending Intentions Survey:
- 69 percent of respondents shared that their spending on cybersecurity will increase in 2022 (#1).
- 48 percent of respondents believe their IT organizations have a problematic shortage of existing skills in this area (#1).
- 38 percent of respondents believe that strengthening cybersecurity will drive the majority of technology spending in their organization in the next 12 months (#1).
The data clearly shows that this area is one of the top concerns for our customers today. They need solutions that significantly simplify increasing cybersecurity activities due to a perceived skills shortage.
It is worth reiterating the critical role that networking plays within Hyperconverged Infrastructure (HCI). In contrast to legacy three-tier architectures, which typically have a dedicated storage network and storage, HCI architecture is more integrated and simplified. Its design lets you share the same network infrastructure for workload-related traffic and intercluster communication with the software-defined storage. The accessibility of the running workloads (from the external network) depends on the reliability of this network infrastructure, and on setting it up properly. The proper setup also impacts the performance and availability of the storage and, as a result, the whole HCI system. To prevent human error, it is best to employ automated solutions to enforce configuration best practices.
VxRail as an HCI system supports VMware NSX, which provides tremendous value for increasing cybersecurity in the data center, with features like microsegmentation and AI-based behavioral analysis and prevention of threats. Although NSX is fully validated with VxRail as a part of VMware Cloud Foundation (VCF) on VxRail platform, setting it outside of VCF requires strong networking skills. The comprehensive capabilities of this network virtualization platform might be overwhelming for VMware vSphere administrators who are not networking experts. What if you only want to consume the security features? This scenario might present a common challenge, especially for customers who are deploying small VxRail environments with few nodes and do not require full VCF on the VxRail stack.
The great news is that VMware recognized these customer challenges and now offers a simplified method to deploy NSX for security use cases. This method fits the improved operational experience our customers are used to with VxRail. This experience is possible with a new VMware vCenter Plug-in for NSX, which we introduce in this blog.
NSX and security
NSX is a comprehensive virtualization platform that provides advanced networking and security capabilities that are entirely decoupled from the physical infrastructure. Implementing networking and security in software, distributed across the hosts responsible for running virtual workloads, provides significant benefits:
- Flexibility—Total flexibility for positioning workloads in the data center enables optimal use of compute resources (a key aspect of virtualization).
- Optimal consumption of CPU resources —Advanced NSX features only consume CPU from the hosts when they are used. This consumption leads to lower cost and simplified provisioning when compared to running the features on dedicated appliances.
- High performance—NSX features are performed in VMware ESXi kernel space, a unique capability on vSphere.
The networking benefits are evident for large deployments, with NSX running in almost all Fortune 100 companies and many medium scale businesses. In today’s world of widespread viruses, ransomware, and even cyber warfare, the security aspect of NSX built on top of the NSX distributed firewall (DFW) is relevant to vSphere customers, regardless of their size.
The NSX DFW is a software firewall instantiated on the vNICs of the virtual machines in the data center. Thanks to its inline position, it provides maximum filtering granularity because it can inspect the traffic coming in and going out of every virtual machine without requiring redirection of the traffic to a security appliance, as shown in the following figure. It also moves along with the virtual machine during vMotion and maintains its state.
Figure 1: Traditional firewall appliance compared to the NSX DFW
The NSX DFW state-of-the-art capabilities are configured centrally from the NSX Manager and allow implementing security policies independently of the network infrastructure. This method makes it easy to implement microsegmentation and compliance requirements without dedicating racks, servers, or subnets to a specific type of workload. With the NSX DFW, security teams can deploy advanced threat prevention capabilities such as distributed IDS/IPS, network sandboxing, and network traffic analysis/network detection and response (NTA/NDR) to protect against known and zero-day threats.
A dedicated solution for security
Many NSX customers who are satisfied with the networking capability of vSphere run their production environment on a VDS with VLAN-backed dvportgroups. They deploy NSX for its security features only, and do not need its advanced networking components. Until now, those customers had to migrate their virtual machines to NSX-backed dvportgroups to benefit from the NSX DFW. This migration is easy but managing networking from NSX modifies the workflow of all the teams, including those teams that are not concerned by security:
Figure 2: Traditional NSX deployment
Starting with NSX 3.2, you can run NSX security on a regular VDS, without introducing the networking components of NSX. The security team receives all the benefits of NSX DFW, and there is no impact to any other team:
Figure 3: NSX Security with vCenter Plugin
Even better, NSX can now integrate further with vCenter, thanks to a plug-in that allows you to configure NSX from the vCenter UI. This method means that NSX can be consumed as a simple security add-on for a traditional vSphere deployment.
How to deploy and configure NSX Security
First, we need to ensure that our VxRail environment meets the following requirements:
- vCenter Server 7.0 U3c (included with VxRail 7.0.320)
- VDS 6.7 or later
- The OVA for NSX-T with the vCenter Plugin version 3.2 or later and an appropriate NSX license
Deploy the NSX Manager and the NSX DFW on ESXi hosts
Running NSX in a vSphere environment consists of deploying a single NSX Manager virtual machine protected by vSphere HA. A shortcut in vCenter enables this step:
Figure 4: Deploy the NSX Manager appliance virtual machine from the NSX tab in vCenter
When the NSX Manager is up and running, it sets up a one-to-one association with vCenter and uploads the plug-in that presents the NSX UI in vCenter, as if NSX security is part of vCenter. The vCenter administrator becomes an effective NSX security administrator.
The next step, performed directly from the vCenter UI, is to enter the NSX license and select the cluster on which to install the NSX DFW binaries:
Figure 5: Select the clusters that will receive the NSX DFW binaries
After the DFW binaries are installed on the ESXi hosts, the NSX security is deployed and operational. You can exit the security configuration wizard (and configure directly from the NSX view in the vCenter UI) or let the wizard run.
Run the security configuration wizard
After installing the NSX binaries on the ESXi hosts, the plug-in runs a wizard that guides you through the configuration of basic security rules according to VMware best practices. The wizard gives the vSphere administrator simple guidance for implementing a baseline configuration that the security team can build on later. There are three different steps in this guided workflow.
First step—Segment the data center in groups
Perform the following steps, as shown in the following figure:
- Create an infrastructure group, identifying the services that the workloads in the data center will access. These services typically include DNS, NTP, DHCP servers, and so on.
- Segment the data center coarsely in environments, such as groups like Development, Production, and DMZ.
- Segment the data center finely by identifying applications running across the different environments.
Figure 6: Example of group creation
Second step—Define communication between different groups
Perform the following steps, as shown in the following figure:
- Define which groups can access the infrastructure services
- Define how the different environments communicate with each other
- Define how applications communicate with each other
Figure 7: Define the communication between environments using a graphcial represenation
Third step—Review the configuration and publish it to the NSX DFW
After reviewing the configuration, publish the configuration to NSX:
Figure 8: Review DFW rules before exiting the wizard
The full NSX UI is now available in vCenter. Select the NSX tab to access the NSX UI directly.
The new VMware vCenter Plug-in for NSX drastically simplifies the deployment and adoption of NSX with VxRail for security use cases. In the past, advanced knowledge of the network virtualization platform was required. A vSphere adminstrator can now deploy it easily, using an intuitive configuration wizard available directly from vCenter.
The VMware vCenter Plug-in for NSX provides the kind of simplified and optimized experience that VxRail customers are used to when managing their HCI environment. It also addresses the challenge that customers face today, improving security even with a perceived shortage of skills in this area. Also, it can be configured easily and quickly, making the robust NSX security features more available for smaller HCI deployments.
VMworld 2021 Session: NET1483 - Deploy and Manage NSX-T via vCenter: A Single Console to Drive VMware SDDC
Francois Tallet, Technical Product Manager, VMware
Karol Boguniewicz, Senior Principal Engineering Technologist, Dell Technologies
New VxRail Node Lets You Start Small with Greater Flexibility in Scaling and Additional Resiliency
Mon, 29 Aug 2022 19:00:25 -0000|
Read Time: 0 minutes
When deploying infrastructure, it is important to know two things: current resource needs and that those resource needs will grow. What we don’t always know is in what way the demands for resources will grow. Resource growth is rarely equal across all resources. Storage demands will grow more rapidly than compute, or vice-versa. At the end of the day, we can only make an educated guess, and time will tell if we guessed right. We can, however, make intelligent choices that increase the flexibility of our growth options and give us the ability to scale resources independently. Enter the single processor Dell VxRail P670F.
The availability of the P670F with only a single processor provides more growth flexibility for our customers who have smaller clusters. By choosing a less compute dense single processor node, the same compute workload will require more nodes. There are two benefits to this:
- More efficient storage: More nodes in the cluster opens the door to using the more capacity efficient erasure coding vSAN storage option. Erasure coding, also known as parity RAID, (such as RAID 5 and RAID 6) has a capacity overhead of 33% compared to the 100% overhead that mirroring requires. Erasure coding can deliver 50% more usable storage capacity while using the same amount of raw capacity. While this increase in storage does come with a write performance penalty, VxRail with vSAN has shown that the gap between erasure coding and mirroring has narrowed significantly, and provides significant storage performance capabilities.
- Reduced cluster overhead: Clusters are designed around N+1, where ‘N’ represents sufficient resources to run the preferred workload, and ‘+1’ are spare and unused resources held in reserve should a failure occur in the nodes that make up the N. As the number of nodes in N increases, the percentage of overall resources that are kept in reserve to provide the +1 for planned and unplanned downtime drops.
Figure 1: Single processor P670F disk group options
You may be wondering, “How does all of this deliver flexibility in the options for scaling?”
You can scale out the cluster by adding a node. Adding a node is the standard option and can be the right choice if you want to increase both compute and storage resources. However, if you want to grow storage, adding capacity drives will deliver that additional storage capacity. The single processor P670F has disk slots for up to 21 capacity drives with three cache drives, which can be populated one at a time, providing over 160TB of raw storage. (This is also a good time to review virtual machine storage policies: does that application really need mirrored storage?) The single processor P670F does not have a single socket motherboard. Instead, it has the same dual socket motherboard as the existing P670F—very much a platform designed for expanding CPU and memory in the future.
If you are starting small, even really small, as in a 2-node cluster (don’t worry, you can still scale out to 64 nodes), the single processor P670F has even more additional features that may be of interest to you. Our customers frequently deploy 2-node clusters outside of their core data center at the edge or at remote locations that can be difficult to access. In these situations, the additional data resiliency that provided by Nested Fault Domains in vSAN is attractive. To provide this additional resiliency on 2-node clusters requires at least three disk groups in each node, for which the single processor P670F is perfectly suited. For more information, see VMware’s Teodora Hristov blog post about Nested fault domain for 2 Node cluster deployments. She also posts related information and blog posts on Twitter.
It is impressive how a single change in configuration options can add so much more configuration flexibility, enabling you to optimize your VxRail nodes specifically to your use cases and needs. These configuration options impact your systems today and as you scale into the future.
Author: David Glynn, Sr. Principal Engineer, VxRail Technical Marketing