We performed this testing on a VMware Horizon virtual desktop environment hosted on a single Dell EMC PowerEdge R7525 server that was equipped with 2nd Gen AMD EPYC processors and six NVIDIA T4 GPUs. We used the NVIDIA nVector performance assessment and benchmarking tool for this testing.
We tested the following three configurations using the NVIDIA nVector tool:
NVIDIA nVector is a performance testing tool from NVIDIA for benchmarking VDI workloads. The nVector tool creates a load on the system by simulating a workload that matches a typical VDI environment. The tool assesses the experience at the endpoint device rather than the response time of the virtual desktop.
The nVector tool captures the performance metrics that quantify user experience, including image quality, frame rate, and user latency, from the endpoints. These metrics, when combined with resource utilization information from the servers under test, enable IT teams to assess their VDI graphics-accelerated environment needs.
We tested multiple runs for each user load scenario to eliminate single-test bias. We used a pilot run to validate that the solution was functioning as expected and we validated that testing data was being captured. We then tested subsequent runs to provide data that confirmed that the results we obtained were consistent.
To confirm true EUE experience, we logged into a VDI session and completed several tasks that are typical of a normal user workload. This small incremental load on the system did not significantly impact our ability to provide reproducible results. While the assessment undoubtedly is subjective, it helps to provide a better understanding of the end-user experience under high load. It also helps to assess the reliability of the overall testing data.
The nVector tool runs the simulated workflow of a typical VDI workload at a predesignated scale. This part of the test requires performance monitoring to measure resource utilization. Acting as an execution engine, nVector orchestrates the necessary stages that are involved in measuring EUE for a predefined number of VDI instances. The following stages are involved in measuring EUE:
The following figure shows the stages in the NVIDIA benchmarking tool's measurement of user experience:
We collected host performance metrics and EUE metrics for the tests involving the nVector Knowledge Worker workload. For the nVector SPECperfview13 workload test (workstation configuration), we collected host performance metrics and SPEC benchmark scores. The nVector end-user experience metrics were not collected for the SPECperview13 workload.
The combination of virtual desktop profiles and simulated user workloads determines the total number of users (density) that the VDI solution can support. This testing focused on the NVIDIA nVector Knowledge Worker workload and the nVector SPECviewperf 13 workloads. Specific metrics and capabilities define each virtual desktop profile and user workload. It is important to understand these terms in the context of this document.
We carried out load-testing on each profile using an appropriate workload that was representative of the relevant use case. The following table summarizes the profile to workload mapping used:
nVector Knowledge Worker
nVector SPECviewperf 13
The following sections of this guide look in detail at the nVector Knowledge Worker and SPECviewperf13 nVector workloads used in this PAAC testing.
nVector Knowledge Worker workload
The nVector Knowledge Worker workload contains a mix of typical office applications, including some multimedia usage. This workload is representative of what a typical office worker does during the working day. The activities performed include:
nVector SPECviewperf 13 workload
The SPECviewperf 13 benchmark is the worldwide standard for measuring graphics performance based on professional applications. The benchmark measures the 3D graphics performance of systems running under the OpenGL and Direct X application programming interfaces (APIs).
The benchmark’s workloads, called viewsets, represent graphics content and behavior from actual applications. The SPECviewperf 13 workload uses a series of viewsets taken from independent software vendor (ISV) applications to characterize the graphics performance of a physical or virtual workstation. For our testing, we ran three iterations of the following viewsets:
For more information about SPECviewperf 13 viewsets, see the SPEC website.
We used VMware vCenter to gather key host utilization metrics, including CPU, GPU, memory, disk, and network usage from the compute host during each test run. This data was exported to .csv files for each host and then consolidated for reporting.
Resource over-utilization can cause poor EUE. We monitored the relevant resource utilization parameters and compared them to relatively conservative thresholds. The thresholds were selected based on industry best practices and our experience to provide an optimal trade-off between good EUE and cost-per-user while also allowing sufficient burst capacity for seasonal or intermittent spikes in demand. The following table shows the pass/fail threshold for host utilization metrics:
Physical host CPU utilization
Physical host memory utilization
Physical host CPU readiness
This section explains the EUE metrics measured by the nVector tool. These metrics include image quality, frame rate, and end-user latency.
Metric 1: Image quality—NVIDIA nVector uses a lightweight agent on the VDI desktop and the client to measure image quality. These agents take multiple screens captures on the VDI desktop and on the thin client to compare later. The structural similarity (SSIM) of the screen capture taken on the client is computed by comparing it to the one taken on the VDI desktop. When the two images are similar, the heatmap will reflect more colors above the spectrum with an SSIM value closer to 1.0, as shown on the right-hand side in Figure 38. As the images become less similar, the heatmap reflects more colors down the spectrum with a value of less than 1.0. More than a hundred pairs of images across an entire set of user sessions are obtained. The average SSIM index of all pairs of images is computed to provide the overall remote session quality for all users.
Metric 2: Frame rate—Frame rate is a common measure of user experience and defines how smooth the experience is. It measures the rate at which frames are delivered on the screen of the endpoint device. For the duration of the workload, NVIDIA nVector collects data on the frames per second (FPS) sent to the display device on the end client. This data is collected from thousands of samples, and the value of the 90th percentile is taken for reporting. A larger FPS indicates a more fluid user experience.
Metric 3: End-user latency—The end-user latency metric defines the level of response of a remote desktop or application. It measures the duration of any lag that an end user experiences when interacting with a remote desktop or application.
This section describes the hardware and software components that we used to validate the solution.
Host hardware configuration
The following table shows the server hardware configuration:
Dell EMC PowerEdge R7525
2 x AMD EPYC 7502 (32-Core, 2.5 GHz)
6 x NVIDIA T4 GPUs
1024 GB @ 3200 MT/s (16 x 64 GB DDR4)
BOSS S1 Card, 256 GB for Hypervisor
2 x 800 GB SAS SSD (cache)
4 x 1.92 TB SAS SSD (capacity)
Mellanox ConnectX-5 25 GbE Dual port SFP28
Software components and versions
The following table shows the software component version details:
Windows 10 desktop version
Windows 10 endpoint version
NVIDIA GRID version
Horizon agent version
6.7.0 - 15160138
1909 - 18363.778
1607 - 14393.36.30
10.1 - 442.06
The following table shows the configuration of the VDI virtual desktops:
ESXi memory configured
ESXi memory reservation
1920 X 1080
1920 X 1080
1920 X 1080
GPU and non-GPU comparison
This section compares GPU and non-GPU test results performed with the NVIDIA nVector Knowledge Worker workload. For the GPU test, we used a single-node R7525 compute host with six NVIDIA T4 GPUs. We enabled 96 virtual machines with an NVIDIA T4-1B vGPU profile. For the non-GPU test, we performed testing on a R7525 compute host hosting 96 virtual machines without enabling vGPU profiles. The server was part of a three-node, VMware vSAN software-defined storage cluster. Both tests were performed on VMware Horizon 7 linked-clone virtual desktops. The Horizon Blast Extreme protocol was used as the remote display protocol with H.264 hardware encoding.
Our objective in performing these tests and comparing the results was to identify whether the GPUs improve the performance and EUE of a VDI virtual desktop running the NVIDIA nVector Knowledge Worker workload. Table 15 compares the utilization metrics gathered from vCenter for both tests while Table 16 compares the end-user experience metrics generated by the nVector tool.
The key findings from the result comparison were:
The following table gives a summary of the average host utilization metrics:
Density per host
Average CPU usage
Average GPU usage
CPU core utilization
Average active memory
Average memory consumed
Average net Mbps per user
nVector Knowledge worker
nVector Knowledge worker
The following table gives a summary of the NVIDIA nVector end-user experience metrics:
Density per host
nVector Knowledge worker
nVector Knowledge worker
For details of the host performance metrics, including CPU, GPU, memory and network usage, collected from vCenter, and the EUE metrics such as image quality, frame rate, and end-user latency measured from endpoints by the nVector tool, see Appendix B and Appendix C.
SPECviewperf13—Virtual Workstation test summary
This section summarizes the SPEC benchmark scores obtained from the nine SPECviewperf 13 viewsets that we ran. A higher SPEC score indicates a greater speed for the simulated graphics application running in the virtual workstation.
We used a single-node R7525 compute host with six NVIDIA T4 GPUs for this virtual workstation configuration test. We enabled 24 virtual machines with an NVIDIA Quadro DWS T4-4Q vGPU profile. The server was part of a three-node, VMware vSAN software-defined storage cluster. The tests were performed on VMware Horizon 7 linked-clone virtual desktops. The Horizon Blast Extreme protocol was used as the remote protocol with H.264 hardware encoding.
The objective of this testing was to obtain the SPEC benchmark scores for nine SPECviewperf viewsets: 3dsmax, Catia, Creo, Maya, Energy, Medical, Showcase, snx, and sw. The SPECviewperf tool measures the FPS at which the GPU can render scenes across a wide variety of applications and usage models. Each viewset represents an application or a usage model, and each composite score is based on a weighted geometric mean of many different scenes and rendering modes.
Figure 39 shows the SPEC scores from the nine SPECviewperf 13 viewsets that we ran. The SPECviewperf 13 viewsets were run on all of the 24 virtual workstations, which ran concurrently on the host. The graph shows the average SPEC score value received from those 24 virtual workstations. Larger scores indicate a greater speed for the application. We ran three iterations for each SPECviewperf 13 viewset. SPEC scores from our tests indicate an excellent graphics performance for professional graphics applications tested in the virtual workstations.
You can compare SPEC benchmark scores from our performance testing with other published scores on the SPEC website.
For details of the CPU and GPU host performance metrics recorded for each of the nine viewset tests that we ran, see Appendix D.
The following figure gives a summary of the SPEC scores:
The following table shows the SPECviewperf 13 FPS scores:
Note: You can find the results and raw data for the SPECviewperf 13 benchmark testing here: https://dell.app.box.com. See the SPEC website for details of these viewsets.