Home > Servers > PowerEdge Components > White Papers > Developing and Deploying Vision AI with Dell and NVIDIA Metropolis > Overview
For organizations seeking to transform their operations through computer vision automation, training and optimizing object detection models can be time consuming and costly. The NVIDIA TAO Toolkit can speed deployment, enabling you incorporate TAO pre-trained models into an end-to-end, intelligent, video-analytics pipeline. This project is detailed in the document Boost Video Analytics Throughput Using NVIDIA Metropolis - Triton Inference Server and DeepStream SDK.
This document describes using the TAO Toolkit to fine-tune and optimize an existing NVIDIA pre-trained object-detector model to target it towards our end purpose. After we obtain a desired, accurate model through combined retraining and pruning, we use the built-in converter in the TAO toolkit to convert the model into inference engines. These engines use the highly optimized TensorRT libraries to map operations onto the parallel GPU computation paths using CUDA instructions. As we deploy the model (or the inference engines derived from it) we use the NVIDIA DeepStream SDK and Triton Inference Server running on a Dell PowerEdge R7515 server to achieve greater throughput. We apply techniques such as multi-stream processing, dynamic batching, optimized object tracking, and multi-instance inference engine execution. The resulting pipelines are complex, but easily built, which maximize high-accuracy-inference throughput. They also demonstrate how deploying NVIDIA GPUs in a cost-optimized PowerEdge server enables large-scale, real-time video analytics.