NVIDIA EGX Platform
The NVIDIA EGX Platform is a cloud-native and scalable software stack for AI workloads running at the edge, as shown in Figure 1. The platform includes the NVIDIA GPU Operator for deploying and managing the GPU and other required components for hosting GPU-enabled containers. NVIDIA GPU Operator is based on the Kubernetes operator framework and automates the management of all NVIDIA software components that are needed to provision GPUs within Kubernetes. It automatically installs and manages the following components:
- NVIDIA GPU driver—Drivers required for NVIDIA GPUs.
- NVIDIA Container Runtime—A GPU-aware container runtime, compatible with popular container technologies such as Docker.
- NVIDIA Kubernetes device plug-in—A device plug-in for Kubernetes that automatically detects GPUs in Kubernetes nodes and manages them.
- NVIDIA Data Center GPU Manager (DCGM)—A suite of tools for managing and monitoring GPUs in cluster environments. It includes active health monitoring, comprehensive diagnostics, system alerts, and governance policies including power and clock management.
The following table lists EGX software stack component versions. Newer versions might be available as they are released. For more information, see the EGX Github website:
Table 2. EGX software stack component details
GPU operator |
1.0.0 |
NVIDIA driver |
440.33.01 |
NVIDIA container runtime |
1.0.5 |
NVIDIA Kubernetes device plug-in |
1.0.0-beta4 |
NVIDIA DCGM |
1.7.2 |
Helm |
2.14.3 |
Kubernetes |
1.15.3 |
Container runtime |
Docker CE 19.03.6 |
Operating system |
Ubuntu Server 18.04.3 LTS |
NVIDIA Metropolis
NVIDIA Metropolis is a collection of tools and technologies that enables AI for smart cities and retail:
- NVIDIA DeepStream SDK—An SDK that delivers a streaming analytics toolkit for AI-based video and image processing, and multisensor processing. Developers use it for building AI-powered intelligent video analytics applications and services. DeepStream is a primary component of the NVIDIA Metropolis platform that enables building end-to-end services and solutions for transforming videos and images to actionable insights. Relevant features for this solution include:
- Reduced memory footprint that results in enhanced stream processing density
- Containerized deployment
- Plug-in sources for inference, message schema converter, and message broker plug-ins
- Support for heterogeneous cameras, segmentation networks, monochrome images, and hardware-accelerated H.264 and H.265 video decoding
- Support for TensorRT-based inferencing for detection, classification, and segmentation
- NVIDIA TensorRT—An SDK for high-performance deep learning inference. It includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for deep learning inference applications.
TensorRT is built on CUDA, which is a parallel programming model from NVIDIA. TensorRT enables optimized inference for all deep learning frameworks by using libraries, development tools, and technologies in CUDA-X for AI, autonomous machines, high-performance computing, and graphics.
- NVIDIA CUDA—A parallel computing platform and programming model that NVIDIA developed for general computing on GPUs. With CUDA, developers can accelerate computing applications by harnessing the power of the GPUs. Applications and operations (such as matrix multiplication) that are typically run serially in CPUs can run on thousands of GPU cores in parallel.
- NVIDIA T4 GPU—A single-slot, low profile, PCIE Express Gen3 Deep Learning accelerator card that is based on the TU104 NVIDIA GPU. The T4 GPU has 16 GB GDDR6 memory and a 70 W maximum power limit. It is a passively cooled board.
NVIDIA Turing Tensor Cores power the T4 GPU to accelerate inference, video transcoding, and virtual desktops.
This Ready Solution uses T4 cards for video decoding and deep learning inference.