Deploy and Serve models with Parameter Efficient Fine-tuning

Thank you for your feedback!

Parameter Efficient Fine-tuning (PEFT) is a strategy for adapting pre-trained large language models to specific tasks while minimizing computational and memory demands. It aims to reduce the computational cost and memory requirements associated with fine-tuning large models while maintaining or even improving their performance. It does so by adding a smaller task-specific layer, leveraging knowledge distillation, and often relying on few-shot learning, resulting in efficient yet effective models for various natural language understanding tasks. PEFT starts with a pre-trained language model that has already learned a wide range of language understanding tasks from a large corpus of text data. These models are usually large and computationally expensive. Instead of fine-tuning the entire pre-trained model, PEFT adds a task-specific layer or a few task-specific layers on top of the pre-trained model. These additional layers are relatively smaller and have fewer parameters compared to the base model.
This example will fine-tune the Llama3-70B and Llama3-8B model using PEFT and then run inference on a text prompt. This example will use the Llama3 model with two task examples from the Optimum Habana library on the Hugging Face model repository. The Optimum Habana library is optimized for Deep Learning training and inference on first generation Gaudi® and Gaudi®2 and offers tasks such as text generation, language modeling, question answering, and more. For all the examples and models, refer to the Optimum Habana GitHub.
Sample configmap.yaml and job.yaml files are provided for both the Llama3-8B and Llama3-70B models. These can be updated as per the requirement based on the number of Gaudi® cards and other hyperparameters for fine-tuning. Refer to the following GitHub repository for more information:
https://github.com/dell-examples/generative-ai/tree/main/intel-XE9680-gaudi3/fine-tuning
Following are the deployment files in the Fine-tuning repository:
- README.md
- LLAMA3-8B folder:
- Configmap.yaml
- Job.yaml
- LLAMA3-70B folder:
- Configmap.yaml
- Job.yaml
For additional information, refer to the following: Intel Gaudi 3 High-Performance Al Accelerator | Dell USA.

Your Browser is Out of Date

Deploy and Serve models with Parameter Efficient Fine-tuning

Deploy and Serve models with Parameter Efficient Fine-tuning