Home > Servers > PowerEdge Components > White Papers > Developing and Deploying Vision AI with Dell and NVIDIA Metropolis > Overview
This paper shows how to achieve high-accuracy video-stream object detection and throughput using pruned 8-bit integer models exported by the NVIDIA DeepStream software stack running on a PowerEdge R7515 server. The 8-bit engines run about three times faster than the corresponding 32-bit floating-point engines.
Using the NVIDIA DeepStream - Triton framework, processing on the PowerEdge R7515 can be loaded with multi-streams, multi-instances, and intervals to further increase total inference throughput. 968 FPS was reached for the RetinaNet and ResNet18 combination, which is roughly 10 times faster than the FPS from a single-model 32-bit floating point engine running on a single stream.
The hardware profile analysis conducted with nvtop (NVIDIA TOP) demonstrates that the deployment with DS-Triton was primarily compute-bound (average GPU compute utilization was 96% among the tests). After it was fully loaded by using multi-streams and multi-instances, it eventually becomes decoder-bound also (maximum decoder capacity reached 96%) with 8 streams and 8 instances on a single T4 GPU. The average GPU memory utilization was around 14% and the average encoder utilization was around 22%.
Also, we plan to complete the following evaluation in the future: