Home > Servers > Specialty Servers > White Papers > Deploy GenAI on the PowerEdge XE9680 with Intel® Gaudi®3 Accelerators > Deploy and Serve models with Text Generation Inference (TGI)
This section covers how to set up and run the TGI-gaudi framework. TGI-gaudi is a powerful framework designed for deploying and serving large-scale language models efficiently. TGI enables seamless interaction with state-of-the-art models, making it easier for developers to integrate advanced natural language processing capabilities into their applications. This section will guide you through the basics of TGI-gaudi, demonstrating how to set up and use the framework to generate text responses based on user inputs and run example benchmarks.
Note: TGI-Gaudi goes through periodic releases, so ensure the versions and related configs in the following .yaml files are updated for the latest releases.
Sample files (configmap, service, deployment, and job YAML files) for the TGI use case with Llama-3-8B running FP8 precision are provided for reference in the following GitHub repository:
https://github.com/dell-examples/generative-ai/tree/main/intel-XE9680-gaudi3/TGI
Following are the deployment files in the TGI repository: