The goal of this use case was to identify the optimal configuration for the ProHawk Vision virtualized servers that will be used within the Computer Vision platform.
The Dell Validated Design for Computer Vision allows us to quickly explore different combinations of virtual CPUs, memory allocation, and GPU mapping to find a combination that makes efficient use of all resources without significant over or under allocation. For many CV workloads, finding a good ratio of CPU cores, to GPU encoder/decoder hardware plus CUDA cores takes careful evaluation. This testing used both the NVIDIA A16 and A40 GPUs. We configured each test with a range of CPU cores, GPU, and memory allocations.
Results
These results are based on multiple different tests where the parameters were tuned to identify the correct balance. As per ProHawk recommendation each video stream requires 4 CPUs to be allocated for processing. A subset of the results are below.
- A16 Results
-
The NVIDIA A16 GPU card presents itself as 4 x A2 cards so it is possible to allocate 1 to 4 GPUs when using an A16.
Table 1. NVIDIA A16 GPU video stream results Stream Count GPU vCPU Memory CPU Usage % GPU Usage % 1 14A16 (A2) 8 16 GB 23 35 2 14A16 (A2) 8 16 GB 53 69 3 14A16 (A2) 12 16 GB 59 94 4 14A16 (A2) 16 16 GB 57 98 4 12A16 (2xA2) 16 16 GB 54 74 4 34A16 (3xA2) 16 16 GB 50 67 4 A16 (4xA2) 16 24 GB 47 32 - A40 Results
-
The NVIDIA A40 GPU card is a single double width card with a high density of CUDA cores for processing data center workloads. It is possible to assign vGPU profiles that split the card into smaller units but the entire card was used for this testing.
Table 2. NVIDIA A40 GPU video stream results Stream Count GPU CPU Memory CPU Usage % GPU Usage % 1 A40 4 16 GB 40 7 2 A40 8 16 GB 39 15 3 A40 12 16 GB 40 21 4 A40 16 16 GB 40 29 5 A40 20 16 GB 46 34 6 A40 24 16 GB 40 49
Findings
- When testing with the A40 and having 4 vCPUs per ProHawk stream creates very large VMs that are difficult to migrate around the system. It could result in VMs with 40+ vCPUs assigned which is larger than recommended for this processor.
- The fact that the A16 presents as 4 distinct GPUs allows smaller VMs to be created that improves the mobility of VMs and reduces the impact if a ProHawk VM fails.
- The optimal configuration above is assigning 1 of the A16/A2 GPUs to a VM with 8 vCPUs. This handles 2 real time ProHawk streams.