The NVIDIA T4 GPU and software components include:
- NVIDIA T4 GPU—A single-slot, low profile, PCIE Express Gen3 Deep Learning accelerator card that is based on the TU104 NVIDIA GPU. The T4 GPU has 16 GB GDDR6 memory and a 70W maximum power limit. It is a passively cooled board.
The T4 GPU is powered by NVIDIA Turing Tensor Cores to accelerate inference, video transcoding, and virtual desktops.
This Ready Solution uses T4 cards for video decoding and deep learning inference.
- NVIDIA DeepStream SDK—An SDK that delivers a streaming analytics toolkit for AI-based video and image processing, and multisensor processing. DeepStream is part of the NVIDIA Metropolis platform that enables building end-to-end services and solutions for transforming videos and images to actionable insights. Relevant features for this solution include:
- Reduced memory footprint that results in enhanced stream processing density
- Integration with Microsoft Azure Edge IoT to build applications and services by using the power of Azure cloud
- Containerized deployment
- Plug-in sources for inference, message schema converter, and message broker plug-ins
- Support for heterogeneous cameras, segmentation networks, monochrome images, and hardware-accelerated H.264 and H.265 video decoding
- Support for TensorRT-based inferencing for detection, classification, and segmentation
- NVIDIA TensorRT—An SDK for high-performance deep learning inference. It includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for deep learning inference applications. TensorRT can optimize neural network models that are trained in all major frameworks, calibrate for lower precision with high accuracy, and deploy to hyperscale data centers, and embedded or automotive product platforms. It is ideally suited for inference from video streaming, such as retail product identification used in this solution.
TensorRT is built on CUDA, which is a parallel programming model from NVIDIA. TensorRT enables optimized inference for all deep learning frameworks by using libraries, development tools, and technologies in CUDA-X for artificial intelligence, autonomous machines, high-performance computing, and graphics.
You can import trained models from every major deep learning framework into TensorRT. After applying optimizations, TensorRT selects platform-specific kernels to maximize performance on T4 GPUs in the data center, Jetson embedded platforms, and NVIDIA DRIVE autonomous driving platforms.
- NVIDIA CUDA—A parallel computing platform and programming model that was developed by NVIDIA for general computing on GPUs. With CUDA, developers can accelerate computing applications by harnessing the power of the GPUs. Applications and operations (such as matrix multiplication) that are typically run serially in CPUs can run on thousands of GPU cores in parallel.