In this performance study, we used two workloads. Both workloads are available as part of the NVIDIA DeepLearningExamples GitHub repository. For the validation, we used synthetic data with both the training and inference workloads. All the workloads specified in this section were run on an NVIDIA AI Enterprise (NVAIE) 3.0 TF1 container.
Note: The performance numbers in this design guide are not to be considered as an actual benchmark; they provide results based on validating various workloads.
We used ‘images per second’ as the metric to measure performance. The experiments were run on four different validation setups. The following table presents the environment. The results are based on the average of three runs respectively.
Table 6. Validation setup and configuration
Validation setup | CPUs/Memory | GPU Profile |
Setup 1 | 3/6 GB | A100-40GB-1-5C |
Setup 2 | 4/8 GB | A100-40GB-2-10C |
Setup 3 | 6/24 GB | A100-40GB-3-20C |
Setup 4 | 10/48 GB | A100-40GB-7-40C |
We enabled MIG in the ESXi server for all the training and inference runs.