The following graphs outline comparisons of the performance of an earlier iteration of this dual VDI and AI program, run on R740 vSAN Ready Nodes with NVIDIA RTX8000 GPUs, and the current program, except for EUX scores which were tested as part of this program.
Note: On the R740 VSAN ready node testing, three NVIDIA RTX8000 GPUs were used with two dedicated to AI and one to VDI.
The AI training and validation times both showed significant improvement when using an R750 vSAN Ready Node with an NVIDIA L40 GPU. Training times showed a 77 percent reduction on an R750 vSAN Ready Node with an NVIDIA L40 GPU compared to R740 with an RTX8000 GPU.
Validation times also improved, with a 47 percent reduction in times between the two configurations.
The utilization of the L40 GPU is higher during the time training is running. The RTX8000 GPU average utilization was approximately 29 percent, while the L40s average utilization was 82 percent, demonstrating that GPU resources were better used.
For validation of the models generated during training, GPU utilization was lower on the L40 compared to the RTX8000 despite validation times being significantly lower on the R750 + L40 setup. However, testing shows that validation is a less GPU intensive activity and the improved times may be attributed to the performance of the R750 over the R740.
Validation takes place on the complete ChestRay 14 Dataset. Model accuracy is validated against all 15 models obtained during training. The results of the highest average AUC accuracy are displayed below. The area under curve (AUC) accuracy is slightly better using the models generated by the R740 & RTX8000 than the R750 & L40. The model training on the R740 was done on two GPUs compared to one on the R750, which explains the difference in accuracy. As an experiment, the VDI solutions team attempted training and validation on the R750 using both L40 GPUs and the mode accuracy improved to 75 percent.