Overview

Model Customization for Code Creation with Red Hat OpenShift AI on Dell AI Optimized Infrastructure

Introduction

Overview

About this document

Audience

Solution Overview Solution design Results or findings Conclusion References

Thank you for your feedback!

AI is becoming integrated into all aspects of our lives and having a real impact on almost all the ways we conduct business and provide services ranging from chat to code development. As businesses continue to define what their AI journey will look like, most are aware that they must start such initiatives to stay relevant.
One of the limitations of the base large language models is their generic behavior which needs customization and adaptation to a specific business use case using customization techniques. Also, LLMs, once trained and generated, do not have access to information beyond the date that they were trained.
There are numerous techniques that can be used to overcome these limitations, one of them is retrieval augmented generation (RAG). RAG extends the functionality of the LLMs by retrieving facts from an external knowledge base hosted using a vector database such as Redis.
Another technique that can be used to fine-tune the model is using a specific dataset. This approach modifies the parameters of the base model making them customized to a business use case such as code development, chatbot, digital assistant, or language translator and transcriber. Fine-tuning aims to maintain the original capabilities of a pretrained model while adapting it to suit more specialized use cases.
Before deciding which approach to use, one should consider the pros and cons of training your base model, fine-tuning an existing model and RAG:
- Training an LLM from scratch requires compute and GPU resources and expertise that are scarce. However, this approach gives you the most control over the final model and its output.
- Fine-tuning an existing model is something that most can do but it requires a significant amount of time. In addition, any updates to the data used for training require the fine-tuning process to be repeated. However, a fine-tuned model will probably have a lower OPEX because there is no need for additional, RAG related, HW, and SW. This is a good option when your source data will not change frequently, and cost is a primary concern.
- LoRA is an efficient fine-tuning method where instead of fine-tuning all the weights that constitute the weight matrix of the pretrained LLM, it optimizes rank decomposition matrices of the dense layers to change during adaptation. These matrices constitute the LoRA adapter. This fine-tuned adapter is then merged with the pretrained model and used for inferencing. The number of parameters is determined by the rank and shape of the original weights. In practice, trainable parameters vary as low as 0.1% to 1% of all the parameters. As the number of parameters needing fine-tuning decreases, the size of gradients and optimizer states attached to them decrease accordingly. Thus, the overall size of the loaded model reduces.
- RAG requires the least amount of fine-tuning and allows the data in the vector DB to be updated continuously. Therefore, RAG is better when data is updated frequently.
This technical white paper provides an example of fine-tuning an LLM running on Dell Technologies AI optimized infrastructure demonstrating a robust developer-friendly ecosystem provided by Red Hat OpenShift AI. The use case consists of a scenario where administrators and operators can manage the life cycle of a large language model from the model customization using distributed fine-tuning to model serving and inferencing to add business value. An ecosystem solution is defined which is developer-friendly, providing a centralized hub where supported software components are available greatly increasing the downstream productivity of large language models.

Your Browser is Out of Date

Overview

Overview