The following list provides the details of our validation setup:
- Foundational models—We validated 8 B and 70 B Llama 3.
- Model customization techniques—We used SFT and LoRA. The next sections show the results of these fine-tuning methods.
- Cluster configuration—We used both Slurm and Kubernetes clusters.
- Dataset—We used the Dolly dataset from Databricks (databricks-dolly-15k). It is an open-source dataset of instruction-following records generated by thousands of Databricks employees in several of the behavioral categories outlined in the InstructGPT paper. The categories include brainstorming, classification, closed QA, generation, information extraction, open QA, and summarization.
- Time for training—Usually, data scientists train a model until it reaches convergence, a point influenced by factors like the dataset, model complexity, and chosen hyperparameters. Our aim was not to achieve convergence for every scenario, as it is specific to our chosen dataset and parameters, offering limited insight into a customer's needs. To maintain a consistent metric across all scenarios, we conducted training jobs for a minimum of 1000 steps.