Home > AI Solutions > Gen AI > Guides > Design Guide—Generative AI in the Enterprise - Inferencing > Inferencing using NVIDIA AI Enterprise
NVIDIA AI Enterprise provides enterprise support for various software frameworks, toolkits, workflows, and models that support inferencing. See the NVIDIA AI Enterprise documentation for more information about all components available with NVIDIA AI Enterprise. The following components incorporated in this design are available as part of NVIDIA AI Enterprise:
The following sections describe the key software components and how they are used in this design. For more information about how these key components work together, see Chapter 4.
NVIDIA Triton Inference Server (also known as Triton) is inference serving software that standardizes AI model deployment and execution. It delivers fast and scalable AI in production. Enterprise support for Triton is available through NVIDIA AI Enterprise. It is also available as an open-source software.
Triton streamlines and standardizes AI inference by enabling teams to deploy, run, and scale trained machine learning or deep learning models from any framework on any GPU- or CPU-based infrastructure. It provides AI researchers and data scientists the freedom to choose the appropriate framework for their projects without impacting production deployment. It also helps developers deliver high-performance inference across cloud, on-premises, edge, and embedded devices.
The benefits of Triton for AI inferencing include the following:
Triton Inference Server is at the core of this design. It is the software that hosts generative AI models. Triton Inference Server provides an ideal software for deploying generative AI models.
NVIDIA Base Command Manager Essentials facilitates seamless operationalization of AI development at scale by providing features like operating system provisioning, firmware upgrades, network and storage configuration, multi-GPU and multinode job scheduling, and system monitoring. It maximizes the use and performance of the underlying hardware architecture.
In this design, we use NVIDIA Base Command Manager Essentials for:
Cluster monitoring and management, including health monitoring, fault tolerance, resource utilization monitoring, software and package management, security and access control, and scaling