Solution approach

Model Customization for Code Creation with Red Hat OpenShift AI on Dell AI Optimized Infrastructure

Introduction Solution Overview

Business challenges

Solution approach

Solution design Results or findings Conclusion References

Thank you for your feedback!

The proposed approach to solving these business challenges includes hardware and software components as well as an overall process for continuous improvement.
Major process steps include training a base coding large language model on additional data then deploying the new model for testing and use as a code generation and explanation system.
The first step is choosing the model. Large language models such as those in the Code Llama family are desirable since they have been trained on vast libraries of code for infilling and new code generation purposes. The most widely used coding scenarios which follow standards have higher weightage when it comes to tune parameters in the model. This results in code recommendations which follow widely accepted industry standards.
The higher the number of tuned parameters the LLM consists of, the higher the accuracy it provides. Hence, for the best accuracy, the Code Llama 70b parameter version is recommended. However, if GPU and compute resources are limited and faster results are wanted at the cost of accuracy, 7b, 13b, or 34b models can be used.
Note: As the state of the art in large language models changes over time, part of the implementation should have capabilities to continually evaluate new models and technology to drive improvements in output.
For the hardware in this solution, a scalable, distributed, multi-tier multi-purpose configuration is proposed. This hardware division of labor includes the following major components and characteristics:
- Training CPU/GPU - worker
  - Scale up and out to increase model capacity/capability and decrease training duration
  - Distribute training workloads across GPUs and nodes
- Model storage (base/current, new merged, new LoRA adapters)
  - With LLMs and datasets ranging in size up to and beyond 1 TB, capacity planning is required to store multiple versions of large models and datasets. Storage accessibility should be at least 25Gps to drive down model load/save times and shared across training and inferencing nodes
  - Examples: PowerScale, ObjectScale, OpenShift Data Foundation (ODF)
- Inferencing - worker
  - Provides endpoints to generate predicted data based on input prompts from end users
  - Monitoring and planning are required to adapt response speed as users scale
  - Multiple techniques are available to optimize throughput including distributed inferencing
- Platform management (OpenShift, OpenShift AI) – control-plane
Red Hat OpenShift AI running on top of its OpenShift container platform provides a robust developer-friendly ecosystem when deployed on Dell Technologies AI optimized infrastructure. With this hardware and software, we can take advantage of multi-GPU training and inferencing by using newly released distributed workloads features of Red Hat OpenShift AI.
Distributed workloads provide the following benefits [1]:
- You can iterate faster and experiment more frequently because of the reduced processing time.
- You can use larger datasets, which can lead to more accurate models.
- You can use complex models that could not be trained on a single node.
- You can submit distributed workloads at any time, and the system then schedules the distributed workload when the required resources are available.
The ongoing functions of the solution include training data and model storage and curation, training framework, and the infrastructure for testing and production inferencing. Finally, the overall continuous improvement process can be loosely defined as the following steps:
1. Download the base model and dataset
2. Use LoRA to customize and fine tune model
3. Merge LoRA adapter back to the original model and save
4. Deploy and test inferencing on the original model
5. Deploy and test inferencing on the newly trained model
6. Evaluate and compare results of inferencing

Your Browser is Out of Date

Solution approach

Solution approach