Test tools
To ensure the optimal combination of end-user experience and cost-per-user, the performance analysis and characterization on Dell Technologies VDI Solutions is carried out using a carefully designed, holistic methodology that monitors both hardware resource utilization parameters and user experience during load testing.
Login VSI is the industry standard tool for testing VDI environments and server-based computing. The tool installs a standard collection of desktop application software (for example, Microsoft Office, Adobe Acrobat Reader, and so on) on each VDI desktop. It then uses launcher systems to connect a specified number of users to available desktops within the environment. After the user is connected, the workload is started by a login script, which starts the test script after the user environment is configured. Each launcher system can launch connections to several target machines (in this case VDI desktops).
For Login VSI, the launchers and virtual session indexer environment are configured and managed by a centralized management console. Additionally, the following login and boot paradigm is used:
- Users were logged in within a login timeframe of 1 hour. An exception to this login timeframe occurred when testing low-density solutions such as GPU/graphics-based configurations. With those configurations, users were logged in every 5 seconds.
- All desktops were pre-booted before logins commenced.
- The data collection interval for vSAN metrics was 5 minutes.
For the AI testing, AI training and AI model validation took place on the same VM. After training took place, the models obtained were then validated with the same VM hardware configuration used in training.
The methodology that we used for training and model validation was similar to the method that was used in the AI-assisted Radiology Using Distributed Deep Learning on Apache Spark and Analytics Zoo White Paper. Model weights were collected for 15 Epochs during training, with each subsequently validated with the same techniques as in the referenced paper. For example, the Average AUC-ROC 1 accuracy was calculated for all disease categories contained in the dataset. The model with the highest-performing AUC-ROC accuracy can be used for inferencing on real-world data. We validated against the entire dataset used for training and not on a withheld or holdout portion of the dataset, which is a common validation technique.
Resource monitoring
The team used several methods for resource and component monitoring during performance testing and in the deployed solutions where applicable.
We used VMware vCenter for VMware vSphere-based solutions to gather key data (CPU, GPU, memory, disk, and network usage) from each of the compute hosts during each test run. We exported this data to .csv files for single hosts and then consolidated it to show data from all hosts (when multiple hosts were tested). While the report does not include specific performance metrics for the Management host servers, we monitored these servers during testing to ensure that they were performing at an expected performance level with no bottlenecks.
We gathered GPU performance metrics directly from the vSphere client, either manually or using a script.
For resource utilization, we determined the user density at a reasonable system load. Testing to system failure was out of scope. To achieve a reasonable system load, we set target thresholds for system resources, as described in the following table. These thresholds reflect a system that is well utilized but not near failure.
Metrics | Target threshold |
Average CPU usage | 85% |
Average CPU core utilization | 85% |
Average CPU readiness | 10% |
Average memory utilization (active) | 85% |
Consumed memory | <100% |
Memory ballooning | None |
Memory swapping | None |
Network throughput | 85% |
Storage latency | 20 milliseconds (ms) |
Spare storage capacity | 15% |
Test configuration
The following table describes the hardware and software components of the infrastructure that we used for the performance analysis and characterization tests:
Category | Platform | CPU | Memory | Network |
Compute host hardware | 4 x VxRail V570F (Density Optimized) | 2 x 6248 at 2.5 GHz, 20-core processors | 768 GB memory at 2933 MT/s 64 GB x 12 DDR-4 | Broadcom Adv. Dual 25 Gb Ethernet |
Management host hardware | 4 x VxRail E460F (Management) | 2 x Intel Xeon CPU E5-2698 v4 at 2.20 GHz | 512 GB Memory at 2400 DDR-4 (32 GB x 16) | Intel 2P X520/2P I350 rNDC |
The following table describes the hardware and software components that we used:
Category | Component |
NVIDIA GPU | 3 NVIDIA RTX 8000 GPU cards installed in one host |
Storage | Compute
Management
|
Network | PowerSwitch S5248-ON switch |
Patches | All Spectre/Meltdown/L1TF patches were applied to the parent image and ESXi hosts as required. |
Protocol | BLAST Extreme H.264 + Switch Codec |
Broker | VMware Horizon 8.0.0 build - 16592062 |
Hypervisor | VMware vSphere ESXi, 7.0.0, 15843807 VMware vCenter 7.0.0 15952498 |
SQL | Microsoft SQL Server 2019 |
vGPU software version | NVIDIA vGPU 11 |
Desktop operating system | Microsoft Windows 10 Enterprise 64-Bit, 1909 version |
Microsoft Office version | Office 2019 |
Management operating system | Microsoft Window Server 2019 |
Login VSI version | Login VSI 4.1.40.1 |
Antivirus software | Windows Defender |
AI training and validation configuration, dataset, and model
The following tables show the AI VM and operating system configurations and the dataset that we chose for our AI training and validation. Other configurations of the AI VM are possible that may improve performance. However, we didn't extensively explore these.
Configuration | CPU | Memory | Disk | GPU PCI 0* | GPU PCI 1* | GPU PCI 2* |
AI VM | 20 vCPUs, 1 core per socket, 20 sockets | 256 GB reserved | 560 GB HD | Rtx8000 Grid_rtx8000p-48q | Rtx8000 Grid_rtx8000p-48q | Rtx8000 Grid_rtx8000p-48q |
*In the testing scenarios that follow, we performed testing with one, two, or all of the GPUs assigned to the AI VM.
Operating system | Version | Kernel | NVIDIA driver | NVIDIA-SMI CUDA | CUDA |
Ubuntu | 18.04.4 LTS | 5.4.0-47-generic | 440.87 | 10.2 | 11.0.182 |
Data set | Images | Description |
ChestXRay14 | 112,120 png images | Chest x-ray dataset, containing over 100,000 frontal view x-ray images with 14 labeled diseases categories. |
Neural network topology | Description |
DenseNet | DenseNet alleviates the vanishing-gradient problem, strengthens feature propagation, encourages feature reuse, and substantially reduces the number of parameters. |
Graphics VM configuration
The following table shows the configuration of the graphics VM:
Configuration | CPU | Memory | Disk | GPU PCI 0 |
Graphics VM | 4 vCPUs | 8 GB reserved | 60 GB HD | Rtx8000 Grid_rtx8000p-2q |
Medical dataset used for graphics
The following table shows the medical dataset that we used for the VDI graphics tests:
Dataset | Images | Description |
DICOM Library for abdomen | 361 DICOM images | CT anonymized images of abdomen dataset, containing 361 DICOM images |