We used the Login VSI test suite to simulate the user experience for several profile types under the typical workload for that type. The following table summarizes the test results that we obtained for the compute hosts using the various workloads and configurations:
User density per host—The number of users per compute host that successfully completed the workload test within the acceptable resource limits for the host. For clusters, this number reflects the average of the density that is achieved for all compute hosts in the cluster.
Average CPU usage—The average CPU usage over the steady-state period. For clusters, this number represents the combined average CPU usage of all compute hosts.
Average active memory—For ESXi hosts, the amount of memory that is actively used, as estimated by the VMkernel based on recently touched memory pages. For clusters, this is the average amount of guest physical memory that is actively used across all compute hosts over the steady-state period.
Average IOPS per user—IOPS calculated from the average disk IOPS over the steady state period divided by the number of users.
1 In a comparable VMware Horizon environment, a 49 percent density increase was seen when compared to AMD EPYC 2nd Generation processors and this density increase can be used as indicative guidance for Citrix-based environments also.
Knowledge Worker, 322 users per host, ESXi 7.0, Citrix 7 1912 LTSR CU1
We performed the testing and validation outlined in this section on a three-node vSAN Ready Node cluster. We tested a total of 966 Knowledge Worker VMs across three hosts for a per-host density of 322. We used the Citrix MCS linked-clone provisioning method to provision pooled-random desktop VMs and used Citrix Thinwire+ as the remote display protocol.
We populated each compute host with 322 virtual machines for a total of 966. With all user virtual machines powered on before the start of the test, the CPU usage was approximately 11 percent.
The following figure shows the performance data for 322 user sessions per host. The CPU reached a steady-state average of 84.5 percent during the test cycle when all users were logged in.
Note: When viewing the CPU Usage metric within VMware vSphere, we observed a number of CPU spikes during testing. We investigated whether this was impactful to the workload and determined that it was not. This appears to be a measurement and reporting issue and not one of performance. If additional insight is received from AMD or VMware, this document will be updated.
CPU core utilization had a steady-state average of 70.4 percent and peaked at 82 percent, indicating that there was still headroom for extra CPU cycles per core.
The CPU readiness percentage was low throughout testing, indicating that the VMs had no significant delays in scheduling CPU time. The readiness steady-state average was 1.8 percent while the peak was 6.63 percent and remained below the threshold of 10 percent. There was a slight spike in readiness during the Logoff phase—this was a direct result of the Citrix Hosting Configuration setting due to the high number of concurrent actions on the system.
In regard to memory consumption for the cluster, out of a total of 2,048 GB of available memory per node, memory usage was not an issue. The compute hosts reached a maximum memory consumption of 1,282 GB with active memory usage reaching a max of 821 GB. There was no ballooning or swapping at any point during the test.
Network bandwidth was not an issue on this test, which ran with a steady-state peak of approximately 3,238 Mbps. The busiest period for network traffic was just after all user logoffs had completed. The host reached a peak of 8,391 Mbps during the deletion and re-creation of the MCS clones. The steady-state average was 1,900 Mbps.
The following figure displays the disk IOPS figure for the vSAN datastore. The graph clearly displays the initial logging in of the desktops, the steady-state and logging out phases, and finally the re-creation of the desktops after testing was complete.
The cluster reached a maximum total of 42,416 disk IOPS (read + write) during the MCS clone re-creation period after testing and a steady-state average of 3,873 disk IOPS (read + write). The steady-state peak was 7,371 disk IOPS (read + write).
Disk I/O latency
Disk I/O latency was not an issue during the Login VSI testing period of this test run. The maximum latency reached on the vSAN datastore was approximately 1.1 ms (read + write) during steady state. This was well below the 20 ms threshold that is regarded as potentially troublesome. The average latency during steady state was 0.57 ms (read + write). A high spike in latency during MCS clone recreation was again due to the aggressive connection properties for vCenter.
The Login VSI Max user experience score shown in the following figure for this test was not reached, indicating that there was no deterioration in user experience based on the number of users we tested.
The following table defines the Login VSI user experience metrics:
Table 10. Description of Login VSI metrics
Login VSI metrics
VSImax shows the number of sessions that can be active on a system before the system is saturated. It is the point where the VSImax V4 average graph line meets the VSImax V4 threshold graph line. The intersection is indicated by a red X in the Login VSI graph. This number gives you an indication of the scalability of the environment (higher is better).
VSIbase is the best performance of the system during a test (the lowest response times). This number is used to determine what the performance threshold will be. VSIbase gives an indication of the base performance of the environment (lower is better).
VSImax v4 average
VSImax v4 average is calculated on the number of active users that are logged into the system, but removes the two highest and two lowest samples to provide a more accurate measurement.
VSImax v4 threshold
VSImax v4 threshold indicates at which point the environment's saturation point is reached (based on VSIbase).
We modified the vSphere Connection settings from their defaults to the ones shown in the following figure. We performed this modification to decrease the amount of time taken to provision and re-create the Citrix MCS linked-clones between tests. The impact of this was the high amount of IOPS, network bandwidth, and latency seen in the system during the re-create phase of the tests. The validation used a two-port 25 GbE NIC design, where storage traffic was configured to use one port and workload and management traffic used the other. Latency would most likely be reduced by either configuring the network in an alternative manner or by adding additional NICs for vSAN network traffic.
2,048 GB of memory installed on each node is more than sufficient for the number of desktops tested.
The feature of MCS linked clones where the virtual machines are deleted and re-created after the users log out leads to the highest workload on each host, higher than at any point during the actual test period itself. The CPU reaches the same maximum levels at it did during testing, and memory, network, and datastore metrics all surpassed the levels seen during the actual test period.
We used the Citrix Virtual Apps and Desktops Thinwire+ remote display protocol with default settings during testing.
The data collection interval was 1 minute for any non vSAN datastore metrics. All vSAN metrics data collection intervals were 5 minutes.