Overview | Developing and Deploying Vision AI with Dell and NVIDIA Metropolis | Dell Technologies Info Hub

Your Browser is Out of Date

Nytro.ai uses technology that works best in other browsers.
For a full experience use one of the browsers below

Overview

Overview

Thank you for your feedback!

This paper shows how to achieve high-accuracy video-stream object detection and throughput using pruned 8-bit integer models exported by the NVIDIA DeepStream software stack running on a PowerEdge R7515 server. The 8-bit engines run about three times faster than the corresponding 32-bit floating-point engines.
Using the NVIDIA DeepStream - Triton framework, processing on the PowerEdge R7515 can be loaded with multi-streams, multi-instances, and intervals to further increase total inference throughput. 968 FPS was reached for the RetinaNet and ResNet18 combination, which is roughly 10 times faster than the FPS from a single-model 32-bit floating point engine running on a single stream.
The hardware profile analysis conducted with nvtop (NVIDIA TOP) demonstrates that the deployment with DS-Triton was primarily compute-bound (average GPU compute utilization was 96% among the tests). After it was fully loaded by using multi-streams and multi-instances, it eventually becomes decoder-bound also (maximum decoder capacity reached 96%) with 8 streams and 8 instances on a single T4 GPU. The average GPU memory utilization was around 14% and the average encoder utilization was around 22%.
Also, we plan to complete the following evaluation in the future:
- Performance evaluation of the low-level tracker DeepSORT.
- Implementation of the process flow used on T4 GPUs with other NVIDIA GPU products like the A30 and A2 running from a Docker file.