To ensure the optimal combination of end-user experience (EUE) and cost-per-user, performance analysis and characterization (PAAC) on Dell VDI solutions is carried out using a carefully designed, holistic methodology that monitors both hardware resource utilization parameters and EUE during load-testing.
Two equivalent APEX instances were tested on this validation effort:
- The APEX Private Cloud “Memory optimized” 8GB memory to 1 CPU core configuration was used with VMware Horizon instant clone VDI desktops for one test scenario, outlined below, to validate the solutions performance. This configuration was tested using the Login VSI VDI load testing tool.
- The APEX Private Cloud “VDI optimized” 32GB memory to 1 CPU core with GPU accelerated graphics. This configuration had vGPU enabled VDI desktops with a 1GB framebuffer for its test scenario. This configuration was tested using NVIDIA’s nVector testing tool.
A three-host VxRail cluster and 64 CPU cores per host provided a total of 192 instances for the test scenario. One host was used for the graphics testing, which provided 64 instances available for testing.
Login VSI performance testing process and monitoring
Each test scenario was repeated 4 times:
- A pilot run to validate that the infrastructure was performing correctly and that valid data could be captured.
- Three subsequent runs to enable data correlation.
During testing, while the environment was under load, we logged in to a session and completed tasks that correspond to the user workload. This test is subjective, but it provides a better understanding of the EUE in the desktop sessions, particularly under high load. It also helps to ensure reliable data gathering.
To ensure that the user experience was not compromised, the Dell VDI team monitored the following important resources:
- Compute host servers—Solutions based on VMware vCenter for VMware vSphere gather key data (CPU, memory, disk, and network usage) from each of the compute hosts during each test run. This data is exported to .csv files for single hosts and then consolidated to show data from all hosts. While the report does not include specific performance metrics for the management host servers, these servers are monitored during testing to ensure that they are performing at an expected level with no bottlenecks.
- Hardware resources—Resource overutilization can cause poor EUE. We monitored the relevant resource utilization parameters and compared them to relatively conservative thresholds. These thresholds are shown in the following table. They were selected based on industry best practices and our experience to provide an optimal trade-off between good EUE and cost-per-user while also allowing sufficient burst capacity for seasonal or intermittent spikes in demand. The following table shows the thresholds that the Dell VDI team set for our testing:
Table 1. Resource utilization parameters Parameter Pass/fail threshold Physical host CPU utilization 85%1 Physical host memory utilization 85% Network throughput 85% Disk latency 20 milliseconds Login VSI failed sessions 2%
Load generation
Login VSI installs a standard collection of desktop application software, including Microsoft Office and Adobe Acrobat Reader, on each VDI desktop testing instance. It then uses a configurable launcher system to connect a specified number of simulated users to available desktops within the environment. When the simulated user is connected, a login script configures the user environment and starts a defined workload. Each launcher system can launch connections to several VDI desktops (target machines). A centralized management console configures and manages the launchers and the Login VSI environment.
We used the following login and boot conditions:
- Users were logged in within a login timeframe of 1 hour.
- All desktops were started before users were logged in.
For NVIDIA nVector, the endpoints and desktops are deployed and monitored using an nVector management VM where the framework is run from, data is collected during the test, and analyzed afterwards. Additionally, the following login and boot paradigm is used:
- Data collection interval for non-VSAN datastores is 1 minute while for VSAN metrics the data collection interval is 5 minutes.
- User logon and workload are two separate phases which are staggered to start every 5 seconds.
- All desktops are pre-booted in advance of logins commencing.
- Data collection is a combination of the automated nVector management framework and manual scripts.
Login VSI workloads
The following table describes the Login VSI workloads that the Dell VDI team tested:
Login VSI workload name | Workload description |
Knowledge Worker | Designed for virtual machines with 2 vCPUs. This workload includes the following activities:
|
nVector Knowledge Worker |
|
Desktop VM test configurations
The following table summarizes the desktop VM configurations used for the Login VSI workload that the Dell VDI team tested. While this desktop configuration is appropriate for the Login VSI workload, evolving application and operating system workloads are creating increased resource requirements, with configurations of up to 4 vCPUs and 8 GB RAM becoming increasingly common for knowledge workers.
Workload | vCPUs | RAM | RAM reserved | Desktop video resolution |
Login VSI Knowledge Worker | 2 | 4 GB | 2 GB | 1920 x 1080 |
The following table summarizes the desktop VM configurations used for the NVIDIA nVector workload that the Dell VDI team tested:
Workload | vCPUs | RAM | RAM reserved | Desktop video resolution | vGPU profile |
nVector Knowledge Worker | 2 | 4 GB | 4 GB | 1920 x 1080 | 1B |
Summary of test results
The following table summarizes the host utilization metrics for the Login VSI workload that we tested, and the user density derived from the performance testing:
Instance type | Operating system | User density per host | Users per instance | Average CPU | Average active memory | Average IOPS per user | Average network Mbps per user |
Memory Optimized | Win 10 22H2 | 170 | 2.65 | 85.4% | 212 GB | 7.5 | 6.16 Mbps |
The following tables summarize the host utilization metrics for the nVector workload that we tested, and the user density derived from the performance testing:
Instance type | Operating system | User density per host | Users per instance | Average CPU | Average active memory | Average IOPS per user | Average network Mbps per user |
VDI Optimized | Win 10 22H2 | 128 | 2 | 88% | 384 GB | 15.4 | 3.47 Mbps |
Average GPU | Frames per second | Image quality | End-user latency (ms) |
25.75% | 23 | 97% | 152 |
The host utilization metrics shown in the preceding table are defined as follows:
- User density—The number of users per compute host that successfully completed the workload test within the acceptable resource limits for the host. For clusters, this number reflects the average of the density achieved for all compute hosts in the cluster.
- Users per instance—The number of users per instance for the Memory optimized or VDI optimized configs. Directly related to the memory to CPU core ratio.
- Average CPU—The average CPU usage over the steady state period. For clusters, this number represents the combined average CPU usage of all compute hosts. On the latest Intel processors, the ESXi host CPU metrics exceed the rated 100 percent for the host if Turbo Boost is enabled, which is the default setting. An additional 35 percent of CPU is available from the Turbo Boost feature, but this additional CPU headroom is not reflected in the VMware vSphere metrics where the performance data is gathered.
- Average active memory—For ESXi hosts, the amount of memory that is actively used, as estimated by the VMKernel based on recently touched memory pages. For clusters, this is the average amount of physical guest memory that is actively used across all compute hosts over the steady state period.
- Average IOPS per user—IOPS calculated from the average cluster disk IOPS over the steady state period divided by the number of users.
- Average network usage per user—Average network usage on all hosts calculated over the steady state period divided by the number of users.
- End-user latency—Measures how remote the session feels or how interactive the sessions is (the amount of lag).
- Frame rate—Measures the number of frames sent to the end-user.
- Image quality—Measures how much the image was impacted and manipulated by the remote protocol (VMware Blast). The SSIM metric is the structural similarity of screenshots taken from the VDI desktop and the endpoint (thin client).
- Average GPU—The combined average GPU usage of all installed GPUs over the test period.