Home > AI Solutions > Artificial Intelligence > White Papers > Training Models Made Easy with Dell Enterprise Hub > Implementing the solution
Training a model can be complex. This paper provides a high-level view of the process. For detailed information, see Using Synthetic Data Generation to Fine Tune a model from the Dell Enterprise Hub | Dell Technologies Info Hub. Knowledge of how to deploy AI models using Kubernetes is also essential. For more information, see Simplifying AI: Dell Enterprise Hub Enables Kubernetes Deployment for AI Models.
Access to Dell Enterprise Hub requires Hugging Face login credentials. Perform the following steps:
A Model Catalog page opens, displaying all the models that are available in Dell Enterprise Hub.
3. Select the Train filter in the upper right of the Catalog page to narrow the display to models that are available to be fine-tuned, as shown in the following example:
Select a model from the catalog that fits your platform and business needs. Training processes can be GPU-intensive, so it is important to select a model that fits your hardware specifications.
On opening the model card, you will find that a training model has two cards: train and deploy. Training the model consists of three tasks: load the model, train the model on selected data, and then redeploy the model using the new training.
Perform the following steps:
A model card opens that is similar to the sample card shown in the following figure:
Column mapping links a corresponding input and output so that the model knows how to process the input data and what to expect from the output. For more information, see Understanding Column Mapping. All the files must be in the CSV format, but the formatting of each dataset should be different based on your use case. Depending on the trainer, the data must have the following columns and names:
Regardless of which trainer you are using, ensure accurate mapping for correct training. To do this:
The following code snippet shows a sample Kubernetes deployment for an SFT training model:
apiVersion: batch/v1
kind: Job
metadata:
name: autotrain-dell-sft
spec:
template:
metadata:
name: autotrain-dell
labels:
app: autotrain-dell
hf.co/model: meta-llama-meta-llama-3.1-8b
spec:
nodeSelector:
kubernetes.io/hostname: node032
# nvidia.com/gpu.product: NVIDIA-L40S
containers:
- name: trl-container
image: registry.dell.huggingface.co/enterprise-dell-training-meta-llama-meta-llama-3.1-8b
args:
- "--model=/app/model"
- "--project-name=fine-tune"
- "--data-path=/app/data"
- "--text-column=text"
- "--trainer=sft"
- "--epochs=3"
- "--mixed_precision=bf16"
- "--batch-size=2"
- "--peft"
- "--quantization=int4"
- "--merge_adapter"
env:
- name: ACCELERATE_LOG_LEVEL
value: "INFO"
- name: TRANSFORMERS_LOG_LEVEL
value: "INFO"
- name: TQDM_POSITION
value: "-1"
resources:
requests:
nvidia.com/gpu: 2
limits:
nvidia.com/gpu: 2
volumeMounts:
- mountPath: /dev/shm
name: dshm
- name: data-mount
mountPath: /app/data
readOnly: true
- name: output-mount
mountPath: /app/autotrain
readOnly: false
volumes:
- name: dshm
emptyDir:
medium: Memory
sizeLimit: 32Gi
- name: data-mount
nfs:
server: f600-21.ai.lab
path: /ifs/data/huggingface/phase-2/autotrain-example-datasets
- name: output-mount
nfs:
server: f600-21.ai.lab
path: /ifs/data/huggingface/phase-2/fine_tunned_model/llama-3.1-8b/760xa/2xL40S
restartPolicy: "Never"
Because AutoTrain is embedded in the container, no additional configuration is required. When the model is deployed, it is immediately ready for training. The Dell AI Solutions team applied the alpacas dataset from Hugging Face.
6. Verify that you have provided the correct path to the trained model and hardware specifications.
7. Deploy the fine-tuned model with Kubernetes or Docker to begin using this model.
The preceding high-level implementation guidance provides all the key concepts that are necessary for fine-tuning models. Training models are challenging to implement because training often requires a deep knowledge of coding and LLMs. The partnership between Dell Technologies and Hugging Face has brought a new era of simplicity to the process of training models. Dell’s hardware validations combined with Hugging Face’s AutoTrain tool and platform bring one of the most technically challenging aspects of generative AI right to the enterprise in an easy, open manner.