Improved performance and user densities with Dell EMC Ready Solutions for VDI, powered by 2nd Generation Intel
Mon, 03 Aug 2020 16:17:48 -0000
|Read Time: 0 minutes
Improved performance and user densities with Dell EMC Ready Solutions for VDI, powered by 2nd Generation Intel Xeon Scalable processors
Published on May 15, 2019 by Anand Johnson, Principal Engineer
Performance is one of the key factors required for a successful Virtual Desktop Infrastructure (VDI) deployment. It is often challenging for IT teams to measure performance and perform capacity planning for their environments without proper benchmarking tests. With the discovery of the Spectre, Meltdown, and L1 Terminal Fault vulnerabilities in 2018, VDI support teams wanted to know the potential negative performance impact these vulnerabilities might have on their environments. A VMware benchmark study performed in August 2018 found that VDI systems patched for L1TF\Foreshadow had a performance degradation of up to 30%.
Spectre, Meltdown, and L1 Terminal Fault are all types of side-channel vulnerabilities, meaning that they use certain design characteristics of modern-day processors, such as timing information, power consumption, electromagnetic leaks, and emitted sounds to gain information for use in the attack. Mitigations provided in the form of operating system patches, hypervisor patches, or microcodes address these vulnerabilities indirectly. However, hardware-level fixes are preferable because addressing these vulnerabilities from the kernel- or software-level is likely to have a negative impact on system performance.
The recently announced 2nd Generation Intel Xeon Scalable processors (Cascade Lake) include fixes in the silicon for Spectre (variant 2), Meltdown (variant 3), and L1 Terminal Fault side-channel methods. These fixes mean that the new processors are expected to provide a better performance than first-generation Intel Xeon Scalable Processors (Skylake) or other previous generation processors, which still require software-level fixes to protect against side-channel vulnerabilities. Cascade Lake processors also come with an improved architecture and higher thermal efficiency that boosts the performance of the systems.
Dell EMC-Ready Solutions for VDI on both VxRail and vSAN ReadyNodes are now available with 2nd Generation Intel Xeon Scalable Processors (Cascade Lake). Cascade Lake processors have more benefits beyond better security, which you can read about here. The Dell EMC VDI engineering team performed an impressive comparison test between Intel Xeon Gold 6248 processor (Cascade Lake) and the previous generation’s Intel Xeon Gold 6138 processor (SkyLake) with the results described below. I think that’s enough of an introduction; let’s get to the meat of the testing.
Test environment
The Dell EMC VDI engineering team performed tests with Login VSI, an industry standard for benchmarking VDI workloads. The following workloads were tested:
· Knowledge Workload running on VMs configured with 2 vCPU and 4 GB RAM
· Power Workload running on VMs configured with 4 vCPU and 8 GB RAM
The test bed environment was a 3-node cluster of VxRail V570F that was optimized for VDI workloads. The cluster was configured and tested with Skylake processors and then with Cascade Lake processors. Environment configuration was:
- PowerEdge R740xd servers
*Intel Xeon Gold 6138, 2 x 20-core, 2.0 GHz processors (Skylake testing)
*Intel Xeon Gold 6248, 2 x 20-core, 2.5 GHz processors (Cascade Lake testing)
*768 GB memory (24 x 32 GB @ 2400 MHz) (Skylake testing)
*768 GB memory (24 x 32 GB @ 2666 MHz) (Cascade Lake testing)
- vSAN hybrid data store using an SSD caching tier
- VMware ESXi 6.7 hypervisor
- VMware Horizon 7.7 VDI software layer
Compute virtual machines are Windows 10, 64-bit, version 1803. One of the VxRail cluster nodes hosted both management and compute virtual machines. The other two nodes were dedicated for workload compute. Figure 1 shows the main components.
Figure 1: Horizon VDI test environment: Main components
For more details about the environment, configuration, and testing methodology see the Design and Validation Guides under "Designs for Vmware Horizon on VxRail and vSAN Ready Nodes" on our VDI Info Hub for Ready Solutions website.
The threshold for CPU, Memory, and Network Usage was set to a conservative 85%. A 20ms Disk Latency threshold was also set. The consensus of the team based on field experience is that if these thresholds are exceeded, the user experience might noticeably degrade.
User density and resource usage metrics
Refer to Figure 2 for the user density results obtained from the Cascade Lake vs. Skylake testing for both the Login VSI power and knowledge workloads.
Figure 2: User Density Per Compute Node
Cascade Lake processors outperformed Skylake processors in user density per compute node by approximately 62% for the knowledge workload and by 54% for the power workload. The CPU steady-state average stayed in the range of 84-87% for all the test cases, which is close to the threshold parameter set during test design. Memory, network usage, and disk latency showed no sign of causing bottlenecks over the duration of the tests. For a detailed analysis report and more resource usage metric results, see the Validation Guide under "Designs for VMware Horizon on VxRail and vSAN Ready Nodes" on our VDI Info Hub for Ready Solutions website.
Login VSI Overview
You can find background information about Login VSI analysis on the Login VSI website. Refer to Figure 3 for Login VSI response time metrics for each test case.
Figure 3: Login VSI user experience summary
VSIBase rating for Cascade Lake testing is “very good,” (0-799ms) and the rating for Skylake testing is “good.”(800-1199ms).VSIBase is the response of the system before actually loading the system with any user sessions. A lower VSIBase value indicates that the performance of the base image is better.
VSIMax was not reached during any of the tests. With CPU reaching a threshold of 84-87%, we noticed that VSIMax Average (the average response time for the whole system) was in the range of 1000-1100ms for all tests. VSIThreshold, which is the saturation point of the environment, was never reached. VSIMax shows the number of sessions that can be active on a system before the system is saturated. Considering the importance of the user experience, it is best not to exceed a threshold of 85% for average CPU utilization during testing. Therefore, the recommended density from our testing is constrained by the thresholds we set for system resources during test design.
Lastly, there were no “stuck sessions” reported during testing, indicating that the system was not overloaded at any point in time.
Our engineering testing suggests that if performance and scalability are currently or expected to be an issue in your environment, upgrading to Cascade Lake processors might help you. Each environment is unique, so we recommend that you benchmark your environment before you make any major changes to your production system to minimize any potential negative performance impacts. With proper benchmarking and by leveraging a tested and validated solution like Dell EMC Ready Architectures for VDI, the performance of your VDI environment does not have to be a challenge. For details about Dell EMC benchmark tests, refer to Ready Architectures for VDI on the VDI Info Hub for Ready Solutions website.
In the next blog, we will explore NVIDIA GPUs and vGPU software as part of the Dell EMC Ready Solutions for VDI.
Published By
Anand Johnson
Principal Engineer at Dell EMC, Technical Marketing ,Ready Solutions for VDI
Related Blog Posts
Next-Generation Graphics Acceleration for Digital Workplaces from Dell EMC and NVIDIA
Fri, 09 Dec 2022 13:58:56 -0000
|Read Time: 0 minutes
Originally published June 2019
For most organizations undergoing a digital transformation, maintaining a good user experience on virtual desktops—an essential component of digital workplaces—is a challenge. Users naturally compare their new virtual desktop experience to their previous physical endpoint experience. As the user experience continues to gain importance in digital workplaces (see this blog for more information), it is essential that virtualized environments keep pace with growing demands for user experience improvements.
This focus on the new user experience is being addressed by developers of modern-day operating systems and applications, who strive to meet the high expectations of their consumers. For example, the Windows 10 operating system, which plays a significant role in today's digital transformation initiatives, is more graphics-intensive than its predecessors. A study by Lakeside Software's SysTrack Community showed a 32 percent increase in graphics requirements when you move from Windows 7 to Windows 10. Microsoft Office applications (PowerPoint, Outlook, Excel, and so on), Skype for Business collaboration software, and all modern-day web browsers are designed to use more graphics acceleration in their newest releases.
Dell EMC Ready Solutions for VDI with NVIDIA Tesla T4 GPU
Dell EMC Ready Solutions for VDI, coupled with NVIDIA GRID Virtual PC (GRID vPC) and Virtual Apps (GRID vApps) software, provides comprehensive graphics acceleration solutions for your desktop virtualization workloads. The core of the NVIDIA GRID software is NVIDIA vGPU technology. This technology creates virtual GPUs, which enables sharing of the underlying GPU hardware among multiple users or virtual desktops running concurrently on a single host. This video compares the quality of a “CPU-only” VDI desktop with a VDI desktop powered by NVIDIA vGPU technology.
The latest NVIDIA GPU offering that supports virtualization is the NVIDIA Tesla T4, which is a universal GPU that can cater to a variety of workloads. The Tesla T4 comes with a 16 GB DDR6 memory. It operates at 70 W, providing higher energy efficiency and lower operating costs than its predecessors, and has a single-slot PCIe form factor. You can configure up to six Tesla T4s in a single Dell EMC PowerEdge R740xd server, providing the highest density for GPU-accelerated VMs in a Dell EMC server. For more details about the NVIDIA Tesla T4 GPU, see the Tesla T4 for Virtualization Technology Brief.
Image courtesy NVIDIA Corporation
Figure 1. NVIDIA vGPU technology stack
Tesla T4 vs. earlier Tesla GPU cards
Let's compare the NVIDIA Tesla T4 with other widely used cards—the NVIDIA Tesla P40 and the NVIDIA Tesla M10.
Tesla T4 vs. Tesla P40:
- The Tesla T4 comes with a maximum framebuffer of 16 GB. In a PowerEdge R740xd server, T4 cards can provide up to 96 GB of memory (16 GB x 6 GPUs), compared to the maximum 72 GB provided by the P40 cards (24 GB x 3 GPUs). So, for higher user densities and cost efficiency, the Tesla T4 is a better option in VDI workloads.
- You might have to sacrifice 3, 6, 12, and 24 GB profiles when using the T4, but 2 GB and 4 GB profiles, which are the most tested and configured profiles in VDI workloads, work well with the Tesla T4. However, NVIDIA Quadro vDWS use cases, which require higher memory per profile, are encouraged to use Tesla P40.
Tesla T4 vs. Tesla M10:
- In the PowerEdge R740xd server, three Tesla M10 cards can give you the same 96 GB memory as six Tesla T4 cards in a PowerEdge R740xd server. However, when it comes to power consumption, the six Tesla T4 cards consume only 420 W (70 W x 6 GPUs), while the three Tesla M10 GPUs consume 675 W (225 W x 3 GPUs), a substantial difference of 255 W per server. When compared to the Tesla M10, the Tesla T4 provides power savings, reducing your data center operating costs.
- Tesla M10 cards support a 512 MB profile, which is not supported by the Tesla T4. However, the 512 MB profile is not a viable option in today’s modern-day workplaces, where graphics-intensive Windows 10 operating systems, multi-monitors, and 4k monitors are prevalent.
The following table provides a summary of the Tesla T4, P40, and M10 cards.
Table 1. Comparison of NVIDIA Tesla T4, P40 & M10
GPU | Form factor | GPUs/board | Memory size | vGPU profiles | Power |
T4 | PCIe 3.0 single slot | 1 | 16 GB GDDR6 | 1 GB, 2 GB, 4 GB, 8 GB, 16 GB | 70 W |
P40 | PCIe 3.0 dual slot | 1 | 24 GB GDDR5 | 1 GB, 2 GB, 3 GB, 4 GB, 6 GB, 8 GB, 12 GB, 24 GB | 250 W |
M10 | PCIe 3.0 dual slot | 4 | 32 GB GDDR5 | .5 GB, 1 GB, 2 GB, 4 GB, 8 GB | 225 W |
(8 per GPU) |
GPU sizing and support for mixed workloads
With multi-monitors and 4K monitors becoming a norm in the modern workplace, streaming high-resolution videos can saturate the encoding engine on the GPUs and increase the load on the CPUs, affecting the performance and scalability of VDI systems. Thus, it is important to size the GPUs based on the number of encoding streams and required frames per second (fps). The Tesla T4 comes with an enhanced NVIDIA NVENC encoder that can provide higher compression and better image quality in H.264 and H.265 (HEVC) video codecs. The Tesla T4 can encode 22 streams at 720 progressive scan (p) resolution, with simultaneous display in high-quality mode. On average, the Tesla T4 can also handle 10 streams at 1080p and 2–3 streams at Ultra HD (2160p) resolutions. Running in a low-latency mode, it can encode 37 streams at 720p resolution, 17–18 streams at 1080p resolution, and 4–5 streams in Ultra HD.
VDI remote protocols such as VMware Blast Extreme can use NVIDIA GRID software and the Tesla T4 to encode video streams in H.265 and H.264 codecs, which can reduce the encoding latency and improve fps, providing a better user experience in digital workplaces. The new Tesla T4 NVENC encoder provides up to 25 percent bitrate savings for H.265 and up to 15 percent bitrate savings for H.264. Refer to this NVIDIA blog to learn more about the Tesla T4 NVENC encoding improvements.
The Tesla T4 is well suited for use in a data center with mixed workloads. For example, it can run VDI workloads during the day and compute workloads at night. This concept, known as VDI by Day, HPC by Night, increases the productivity and utilization of data center resources and reduces data center operating costs.
Tesla T4 testing on Dell EMC VDI Ready Solution
At Dell EMC, our engineering team tested the NVIDIA Tesla T4 on our Ready Solutions VDI stack based on the Dell EMC VxRail hyperconverged infrastructure. The test bed environment was a 3-node VxRail V570F appliance cluster that was optimized for VDI workloads. The cluster was configured with 2nd Generation Intel Xeon Scalable processors (Cascade Lake) and with NVIDIA Tesla T4 cards in one of the compute hosts. The environment included the following components:
- PowerEdge R740xd server
- Intel Xeon Gold 6248, 2 x 20-core, 2.5 GHz processors (Cascade Lake)
- NVIDIA Tesla T4 GPUs with 768 GB memory (12 x 64 GB @ 2,933 MHz)
- VMware vSAN hybrid datastore using an SSD caching tier
- VMware ESXi 6.7 hypervisor
- VMware Horizon 7.7 VDI software layer
Dell EMC Engineering used the Power Worker workload from Login VSI for testing. You can find background information about Login VSI analysis at Login VSI Analyzing Results.
The GPU-enabled PowerEdge compute server hosted 96 VMs with a GRID vPC vGPU profile (T4-1B) of 1 GB memory each. The host was configured with six NVIDIA Tesla T4 cards, the maximum possible configuration for the NVIDIA Tesla T4 in a Dell PowerEdge R740xd server.
With all VMs powered on, the host server recorded a steady-state average CPU utilization of approximately 95 percent and a steady-state average GPU utilization of approximately 34 percent. Login VSImax—the active number of sessions at the saturation point of the system—was not reached, which means the performance of the system was very good. Our standard threshold of 85 percent for average CPU utilization was relaxed for this testing to demonstrate the performance when graphics resources are fully utilized (96 profiles per host). You might get a better user experience with managing CPU at a threshold of 85 percent by decreasing user density or by using a higher-binned CPU. However, if your CPU is a previous generation Intel Xeon Scalable processor (Skylake), the recommendation is to use only up to four NVIDIA Tesla cards per PowerEdge R740xd server. With six T4 cards per PowerEdge R740xd server, the GPUs were connected to both x8 and x16 lanes. We found no issues using both x8 and x16 lanes and, as indicated by the Login VSI test results, system performance was very good.
Dell EMC Engineering performed similar tests with a Login VSI Multimedia Workload using 48 vGPU-enabled VMs on a GPU-enabled compute host, each having a Quadro vDWS-vGPU profile (T4-2Q) with a 2 GB frame buffer. With all VMs powered on, the average steady-state CPU utilization was approximately 48 percent, and the average steady-state GPU utilization was approximately 35 percent. The system performed well and the user experience was very good.
For more information about the test-bed environment configuration and additional resource utilization metrics, see the design and validation guides for VMware Horizon on VxRail and vSAN on our VDI Info Hub.
Summary
Just as Windows 10 and modern applications are incorporating more graphics to meet user expectations, virtualized environments must keep pace with demands for an improved user experience. Dell EMC Ready Solutions for VDI, coupled with the NVIDIA Tesla T4 vGPU, are tested and validated solutions that provide the high-quality user experience that today’s workforce demands. Dell EMC Engineering used Login VSI’s Power Worker Workload and Multimedia Workload to test Ready Solutions for VDI with the Tesla T4, and observed very good results in both system performance and user experience.
In the next blog, we will discuss the affect of memory speed on VDI user density based on testing done by Dell EMC VDI engineering team. Stay tuned and we’d love to get your feedback!
VDI on Dell Technologies Cloud Platform – Part 1: Introduction
Fri, 09 Dec 2022 13:47:56 -0000
|Read Time: 0 minutes
The way we work is changing. With more employees working from home and outside the office on flexible schedules, organizations are shifting towards digital workspaces. Digital workspaces allow employees to access their applications and data from anywhere, anytime, across any device. The flexibility offered by digital workspaces fosters collaboration and enhances the productivity of employees.
Virtual desktop infrastructure (VDI) is an enabling technology for workspace transformation initiatives. A growing number of organizations rely on VDI for providing accessibility to business applications and data while ensuring a secure and superior user experience. VDI provides the agility, security, and centralized management that are critical to successful workspace transformation initiatives.
According to a survey by market intelligence company IDC, 93 percent of customers will deploy their workloads across two or more clouds. A multi-cloud approach comes with its unique benefits, and VDI is a workload that takes full advantage of it. For example, VDI customers can utilize the flexibility and economics of the multi-cloud approach by extending their on-premises infrastructure for seasonal demand spikes and/or can also host a disaster recovery (DR) environment on the public cloud.
However, a multi-cloud approach can increase complexity by creating multiple management and operational silos. Due to the difference in the architecture and environments of the multiple clouds involved, workload migrations are often complicated. Maintaining consistent and efficient security is challenging with multiple cloud providers, and existing security best practices adopted by your organization may not be portable across a multi-cloud environment. The best solution to overcome these challenges is a hybrid cloud approach that offers consistent operations and infrastructure.
VCF on VxRail, the Dell Technologies Cloud Platform1 (DTCP), takes the complexity out of a multi-cloud environment by offering true hybrid compatibility and facilitating consistent operations across private and public cloud environments. DTCP is an on-premises infrastructure based on industry-leading Dell EMC VxRail hyper-converged infrastructure running VMWare Cloud Foundation (VCF). It offers options to extend your infrastructure to Dell Technologies’ partner public clouds, providing choice and flexibility. DTCP allows you to build standardized VMware Software-Defined Data Center (SDDC) architecture that provides a consistent infrastructure connecting your on-premises and public clouds.
Figure 1: VCF on VxRail, Dell Technologies Cloud Platform
Let’s see how you can benefit from a VDI solution based on VMware Horizon running on DTCP.
VMware Horizon on DTCP
Dell Technologies offers a tested and validated VMware Horizon solution running on DTCP for your VDI workloads. Horizon on DTCP allows you to leverage a software-defined infrastructure for compute, storage, networking, and security with the market-leading capabilities of VMware Horizon for a complete, secure, and easy-to-operate desktop and application virtualization solution. The native integration of VxRail Manager with SDDC Manager offers automation and simplifies lifecycle management for your entire VDI stack, including hardware. With VMware NSX2, you can secure east-west traffic within your data center by creating fast and simple network policies that follow virtual desktops. The Micro-segmentation feature of NSX creates a perimeter defense around the virtual desktops, eliminating unauthorized access between virtual desktops and adjacent critical workloads.
Our Horizon solution architecture aligns with the VMware Horizon Cloud Pod Architecture (CPA)3. CPA allows you to join multiple pods to form a single Horizon implementation. This pod federation spans multiple sites, simplifying the administration effort that is required to manage a large-scale Horizon deployment. See the ‘VDI on DTCP using VMware Horizon’ reference architecture guide4 available on the VDI InfoHub for more details on our validated solution.
VMware Horizon on DTCP offers a hybrid platform where you can easily enable public cloud use-cases like provisioning additional capacity and DR. With DTCP, you can have an extended Horizon deployment on one of our public partner clouds such as VMware Cloud (VMC) on AWS. VMC on AWS delivers VMware SDDCs as a service on the AWS cloud. If you already have a Horizon installation on-premises on VMware SDDC, you can leverage those skills to build a Horizon infrastructure on VMC on AWS. You get a unified architecture, operational consistency, and a similar feature set for Horizon across on-prem and VMC on AWS.
Conclusion
DTCP can offer you a true hybrid cloud experience by delivering consistent operations and infrastructure for your VDI workloads across a multi-cloud environment. By running VDI on DTCP powered by Dell EMC VxRail hyper-converged infrastructure, you can enable typical VDI use-cases like provisioning additional capacity and DR in a simple, flexible and cost-effective manner.
DTCP is also available with subscription pricing5, which gives you freedom of choice between CapEx and OpEx models. You can start small, easily scale and align with growing business needs.
I hope you enjoyed reading part 1 of this blog series. In part 2, we will discuss the public cloud interoperability use-cases of VMware Horizon on DTCP in detail. Stay tuned!
Additional Resources
- DTCP: https://www.delltechnologies.com/en-in/solutions/cloud/vmware-cloud-foundation-on-vxrail.htm
- NSX for Horizon: https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/products/horizon/vmware-nsx-with-horizon.pdf
- Horizon Cloud Pod Architecture: https://techzone.vmware.com/resource/workspace-one-and-horizon-reference-architecture#sec10-sub2
- VDI on DTCP using VMware Horizon: https://infohub.delltechnologies.com/section-assets/h18160-vdi-dtcp-horizon-reference-architecture
- DTCP with the subscription: https://www.dellemc.com/en-us/collaterals/unauth/offering-overview-documents/products/dell-technologies-cloud/h18181-dell-technologies-cloud-platform-with-subscription-solution-brief.pdf