Deploy and Serve models with Text Generation Inference (TGI) | Deploy GenAI on the PowerEdge XE9680 with Intel® Gaudi®3 Accelerators | Dell Technologies Info Hub

Your Browser is Out of Date

Nytro.ai uses technology that works best in other browsers.
For a full experience use one of the browsers below

Deploy and Serve models with Text Generation Inference (TGI)

Deploy and Serve models with Text Generation Inference (TGI)

Thank you for your feedback!

This section covers how to set up and run the TGI-gaudi framework. TGI-gaudi is a powerful framework designed for deploying and serving large-scale language models efficiently. TGI enables seamless interaction with state-of-the-art models, making it easier for developers to integrate advanced natural language processing capabilities into their applications. This section will guide you through the basics of TGI-gaudi, demonstrating how to set up and use the framework to generate text responses based on user inputs and run example benchmarks.
Note: TGI-Gaudi goes through periodic releases, so ensure the versions and related configs in the following .yaml files are updated for the latest releases.
Sample files (configmap, service, deployment, and job YAML files) for the TGI use case with Llama-3-8B running FP8 precision are provided for reference in the following GitHub repository:
https://github.com/dell-examples/generative-ai/tree/main/intel-XE9680-gaudi3/TGI
Following are the deployment files in the TGI repository:
- README.md
- configmap.yaml
- job.yaml
- deployment.yaml
- service.yaml