Satellite nodes: Because sometimes even a 2-node cluster is too much
Tue, 01 Mar 2022 15:03:31 -0000|
Read Time: 0 minutes
Wait a minute, where's me cluster? Oh no.
You may have noticed a different approach from Dell EMC VxRail in Daniel Chiu’s blog A Taste of VxRail Deployment Flexibility. In short, we are extending the value of VxRail into new adjacencies, into new places, and new use cases. With the release of VxRail dynamic nodes in September, these benefits became a new reality in the landscape of VxRail deployment flexibility:
- Using VxRail for compute clusters with vSAN HCI Mesh
- Using storage arrays with VxRail dynamic nodes in VMware Cloud Foundation on VxRail
- Extending the benefits of VxRail HCI System Software to traditional 3-tier architectures using Dell EMC for primary storage
The newest adjacency in 7.0.320 is the VxRail satellite node, as sometimes even a 2-node cluster may be too much.
A VxRail satellite node is ideal for those workloads where the SLA and compute demands do not justify even the smallest of 2-node clusters – in the past you might have even recycled a desktop to meet these requirements. Think retail and ROBO with their many distributed sites, or 5G with its “shared nothing” architecture. But in today’s IT environment, out of sight cannot mean out of mind. Workloads are everywhere and anywhere. The datacenter and the public cloud are just two of the many locations where workloads exist, and compute is needed. These infrastructure needs are well understood, and in the case of public cloud – out of scope. The challenge for IT is managing and maintaining the growing and varied infrastructure demands of workloads outside of the data center, like the edge, in its many different forms. The demands of the edge vary greatly. But even with infrastructure needs met with a single server, IT is still on the hook for managing and maintaining it.
While satellite nodes are a single node extension of VxRail, they are managed and life cycled by the VxRail Manager from a VxRail with vSAN cluster. Targeted at existing VxRail customers, these single nodes should not be thought of as lightweights. We’re leveraging the existing VxRail E660, E660F, and V670F with all their varied hardware options, and have added support for the PERC H755 adapter for local RAID protected storage. This provides options as lightweight as a E660 with an eight core Intel Xeon Gen 3 Scalable processor and spinning disks, all the way up to a V670F with dual 40 core Intel Xeon Gen 3 Scalable processors, accelerated by a pair of NVIDIA Ampere A100 80GB Data Center GPUs, and over 150 TB of flash storage. Because edge workloads come in all sizes from small to HUUUUGE!!!
Back when I started in IT, a story about a Missing Novell server discovered after four years sealed behind a wall was making the rounds. While it was later claimed to be false, it was a story that resonated with many season IT professionals and continues to do so today. Regardless as to where a workload is running, the onus is on IT not only to protect that workload, but also to protect the network, all the other workloads on the network, and anything that might connect to that workload. This is done in layers, with firewalls, DMZ, VPN, and so on. But it is also done by keeping hypervisors updated, and BIOS and firmware up to date.
For six years, VxRail HCI System Software has been helping virtualization administrators keep their VxRail with vSAN cluster up to date, regardless as to where they are in the world -- be it at a remote monitoring station, running a grocery store, or in the dark sealed up behind a wall. VxRail satellite nodes and VxRail dynamic nodes extend the VxRail operating model into new adjacencies. We are enabling you our customers to manage and life cycle these ever growing and diverse workloads with the click of a button.
Also, in the release of VxRail 7.0.320 are two notable stand-outs. The first is validation of Dell EMC PowerFlex scale-out SDS as an option for use with VxRail dynamic nodes. The second is increased resilience for vSAN 2-node clusters (also applies to stretched clusters) which are often used at the edge. Both John Nicholson and Teodora Hristov of VMware do a great job of explaining the nuts and bolts of this useful addition. But I want to reiterate that for 2-node deployments, this increased resilience will require that each node have three disk groups.
Don’t let the fact that a workload is too small, or too remote, or not suited to HCI, be the reason for your company to be at risk by running out-of-date firmware and BIOS. There is more flexibility than ever with VxRail, much more, and the value of VxRail’s automation and HCI System Software can now be extended to the granularity of a single node deployment.
Related Blog Posts
New VxRail Node Lets You Start Small with Greater Flexibility in Scaling and Additional Resiliency
Mon, 29 Aug 2022 19:00:25 -0000|
Read Time: 0 minutes
When deploying infrastructure, it is important to know two things: current resource needs and that those resource needs will grow. What we don’t always know is in what way the demands for resources will grow. Resource growth is rarely equal across all resources. Storage demands will grow more rapidly than compute, or vice-versa. At the end of the day, we can only make an educated guess, and time will tell if we guessed right. We can, however, make intelligent choices that increase the flexibility of our growth options and give us the ability to scale resources independently. Enter the single processor Dell VxRail P670F.
The availability of the P670F with only a single processor provides more growth flexibility for our customers who have smaller clusters. By choosing a less compute dense single processor node, the same compute workload will require more nodes. There are two benefits to this:
- More efficient storage: More nodes in the cluster opens the door to using the more capacity efficient erasure coding vSAN storage option. Erasure coding, also known as parity RAID, (such as RAID 5 and RAID 6) has a capacity overhead of 33% compared to the 100% overhead that mirroring requires. Erasure coding can deliver 50% more usable storage capacity while using the same amount of raw capacity. While this increase in storage does come with a write performance penalty, VxRail with vSAN has shown that the gap between erasure coding and mirroring has narrowed significantly, and provides significant storage performance capabilities.
- Reduced cluster overhead: Clusters are designed around N+1, where ‘N’ represents sufficient resources to run the preferred workload, and ‘+1’ are spare and unused resources held in reserve should a failure occur in the nodes that make up the N. As the number of nodes in N increases, the percentage of overall resources that are kept in reserve to provide the +1 for planned and unplanned downtime drops.
Figure 1: Single processor P670F disk group options
You may be wondering, “How does all of this deliver flexibility in the options for scaling?”
You can scale out the cluster by adding a node. Adding a node is the standard option and can be the right choice if you want to increase both compute and storage resources. However, if you want to grow storage, adding capacity drives will deliver that additional storage capacity. The single processor P670F has disk slots for up to 21 capacity drives with three cache drives, which can be populated one at a time, providing over 160TB of raw storage. (This is also a good time to review virtual machine storage policies: does that application really need mirrored storage?) The single processor P670F does not have a single socket motherboard. Instead, it has the same dual socket motherboard as the existing P670F—very much a platform designed for expanding CPU and memory in the future.
If you are starting small, even really small, as in a 2-node cluster (don’t worry, you can still scale out to 64 nodes), the single processor P670F has even more additional features that may be of interest to you. Our customers frequently deploy 2-node clusters outside of their core data center at the edge or at remote locations that can be difficult to access. In these situations, the additional data resiliency that provided by Nested Fault Domains in vSAN is attractive. To provide this additional resiliency on 2-node clusters requires at least three disk groups in each node, for which the single processor P670F is perfectly suited. For more information, see VMware’s Teodora Hristov blog post about Nested fault domain for 2 Node cluster deployments. She also posts related information and blog posts on Twitter.
It is impressive how a single change in configuration options can add so much more configuration flexibility, enabling you to optimize your VxRail nodes specifically to your use cases and needs. These configuration options impact your systems today and as you scale into the future.
Author: David Glynn, Sr. Principal Engineer, VxRail Technical Marketing
I feel the need – the need for speed (and endurance): Intel Optane edition
Wed, 13 Oct 2021 17:37:52 -0000|
Read Time: 0 minutes
It has only been three short months since we launched VxRail on 15th Generation PowerEdge, but we're already expanding the selection of configuration offerings. So far we've added 18 additional processors to power your workloads, including some high frequency and low core count options. This is delightful news for those with applications that are licensed per core, an additional NVIDIA GPU - the A30, a slew of additional drives, and doubled the RAM capacity to 8TB. I've probably missed something, as it can be hard to keep up with the all the innovations taking place within this race car that is VxRail!
In my last blog, I hinted at one of those drive additions, faster cache drives. Today I'm excited to announce that you can now order, and turbo charge your VxRail with the 400GB or 800GB Intel P5800X – Intel’s second generation Optane NVMe drive. Before we delve into some of the performance numbers, let’s discuss what it is about the Optane drives that makes them so special. More specifically, what is it about them that enables them to deliver so much more performance, in addition to significantly higher endurance rates.
To grossly over-simplify it, and my apologies in advance to the Intel engineers who poured their lives into this, when writing to NAND flash an erase cycle needs to be performed before a write can be made. These erase cycles are time-consuming operations and the main reason why random write IO capabilities on NAND flash is often a fraction of the read capability. Additionally, a garbage collection is running continuously in the background to ensure that there is space available to incoming writes. Optane, on the other hand, does bit-level write in place operations, therefore it doesn’t require an erase cycle, garbage collection, or performance penalty writes. Hence, random write IO capability almost matches the random read IO capability. So just how much better is endurance with this new Optane drive? Endurance can be measured in Drive Writes Per Day (DWPD), which measures how many times the drive's entire size could be overwritten each day of its warranty life. For the 1.6TB NVMe P5600 this is 3 DWPD, or 55 MB per second, every second for five years – just shy of 9PB of writes, not bad. However, the 800GB Optane P5800X will endure 146PB over its five-year warranty life, or almost 1 GB per second (926 MB/s) every second for its five year 100 DWPD warranty life. Not quite indestructible, but that is a lot of writes, so much so you don’t need extra capacity for wear leveling and a smaller capacity drive will suffice.
You might wonder why you should care about endurance, as Dell EMC will replace the drive under warranty anyway – there are three reasons. When a cache drive fails, its diskgroup is taken offline, so not only have you lost performance and capacity, your environment is taking on the additional burden of a rebuild operation to re-protect your data. Secondly, more and more systems are being deployed outside of the core data center. Replacing a drive in your data center is straightforward, and you might even have spares onsite, but what about outside of your core datacenter? What is your plan for replacing a drive at a remote office, or a thousand miles away? What if that remote location is not an office but an oilrig one hundred miles offshore, or a cruise ship halfway around the world where the cost of getting a replacement drive there is not trivial? In these remote locations, onsite spares are commonplace, but the exceptions are what lead me to the third reason, Murphy's Law. IT and IT staffing might be an afterthought at these remote locations. Getting a failed drive swapped out at a remote location which lacks true IT staffing may not get the priority it deserves, and then there is that ever present risk of user error... “Oh, you meant the other drive?!? Sorry...”
Cache in its many forms plays an important role in the datacenter. Cache enables switches and storage to deliver higher levels of performance. On VxRail, our cache drives fall into two categories, SAS and NVMe, with NVMe delivering up to 35% higher IOPS and 14% lower latency. Among our NVMe cache drive we have two from Intel, the 1.6TB P5600 and the Optane P5800X, in 400GB and 800GB capacities. The links for each will bring you to the drive specification including performance details. But how does the performance at a drive level impact performance at the solution level? Because, at the end of the day that is what your application consumes at the solution level, after cache mirroring, network hops, and the vSAN stack. Intel is a great partner to work with, when we checked with them about publishing solution level performance data comparing the two drives side-by-side, they were all for it.
In my over-simplified explanation above, I described how the write cycle for Optane drives is significantly different as an erase operation and does not need to be done first. So how does that play out in a full solution stack? Figure 1 compares a four node VxRail P670F cluster, running a 100% sequential write 64KB workload. Not a test that reflects any real-world workload, but one that really stresses the vSAN cache layer, highlights the consistent write performance that 3D XPoint technology delivers, and shows how Optane is able to de-stage cache when it fills up without compromising performance.
Figure 1: Optane cache drives deliver consistent and predictable write performance
When we look at performance, there are two numbers to keep in mind: IOPS and latency. The target is to have high IOPS with low and predictable latency, at a real-world IO size and read:write ratio. To that end, let’s look at how VxRail performance differs with the P5600 and P5800X under OLTP32K (70R30W) and RDBMS (60R40W) benchmark workload, as shown in Figure 2.
Figure 2: Optane cache drives deliver higher performance and lower latency across a variety of workload types.
It doesn't take an expert to see that with the P5800X this four node VxRail P670F cluster's peak performance is significantly higher than when it is equipped with the P5600 as a cache drive. For RDBMS workloads up to 44% higher IOPS with a 37% reduction in latency. But peak performance isn't everything. Many workloads, particularly databases, place a higher importance on latency requirements. What if our workload, database or otherwise, requires 1ms response times? Maybe this is the Service Level Agreement (SLA) that the infrastructure team has with the application team. In such a situation, based on the data shown, and for a OLTP 70:30 workload with a 32K block size, the VxRail cluster would deliver over twice the performance at the same latency SLA, going from 147,746 to 314,300 IOPS.
In the datacenter, as in life, we are often faced with "Good, fast, or cheap. Choose two." When you compare the price tag of the P5600 and P5800X side by side, the Optane drive has a significant premium for its good and fast. However, keep in mind that you are not buying an individual drive, you are buying a full stack solution of several pieces of hardware and software, where the cost of the premium pales in comparison to the increased endurance and performance. Whether you are looking to turbo charge your VxRail like a racecar, or make it as robust as a tank, Intel Optane SSD drives will get you both.
David Glynn, Technical Marketing Engineer, VxRail at Dell Technologies
LinkedIn: David Glynn