Fine-tuning | Driving GenAI Advancements: Dell PowerEdge R760 with the Latest 5th Gen Intel® Xeon® Scalable Processors | Dell Technologies Info Hub

Your Browser is Out of Date

Nytro.ai uses technology that works best in other browsers.
For a full experience use one of the browsers below

Fine-tuning

Fine-tuning

Thank you for your feedback!

Table 4. Software configuration for distributed fine-tuning across 1,2,4 nodes

Workload	Bio-GPT
Application	IPEX 2.2.0+gitad9564f6/llm_feature_branch
Tools/Compilers	gcc=12.2.1
Middleware, Framework, Runtimes	cmake-3.20.2, findutils-4.6.0, bzip2-1.0.6, gcc-8.5.0, gcc-c++-8.5.0, gcc-toolset-12-12.0, gcc-toolset-12-runtime-12.0, git-2.39.3, gperftools-devel-2.7-9.el8, libatomic-8.5.0, libfabric-1.17.0, procps-ng-3.3.15, python3-distutils-extra-2.39, python39-3.9.16, python39-devel-3.9.16, python39-pip-20.2.4, unzip-6.0, wget-1.19.5, which-2.21, torch==2.2.0.dev20231006+cpu, torchvision==0.17.0.dev20231006+cpu, torchaudio==2.2.0.dev20231006+cpu, ninja==1.11.1, accelerate==0.23.0, sentencepiece==0.1.99, protobuf==3.20.3, datasets==2.14.5, transformers==4.35.0, sacremoses, scikit-learn, peft, gdown, llvm-16.0.6, mpi4py==3.1.4, IPEX https://github.com/intel/intel-extension-for-pytorch - branch: llm_feature_branch (commit ad9564f61aef5e6be41ff04fbc17f308d43ad300), TorchCCL - https://github.com/intel/torch-ccl - tag: ccl_torch_dev_0905, Deepspeed https://github.com/delock/DeepSpeedSYCLSupport - branch: gma/run-opt-branch (commit f0ef3eaa959617eb5d29d7fc4132fde8e6773cbe)
Orchestration	Kubernetes v1.27.5
Command line	mpirun -n 2 -ppn 1 -iface $FI_TCP_IFACE -genv OMP_NUM_THREADS=$OMP_NUM_THREADS -genv MASTER_ADDR=$MASTER_ADDR -genv MASTER_PORT=$MASTER_PORT -genv LD_PRELOAD=/usr/lib64/libstdc++.so.6:/usr/lib64/libtcmalloc.so:/opt/intel/oneapi/compiler/2023.2.1/linux/compiler/lib/intel64_lin/libiomp5.so -genv TCMALLOC_LARGE_ALLOC_REPORT_THRESHOLD=4294967296 -f /machinefile python3 /bio-gpt/finetune_multinode_biogpt.py --model_name_or_path "/datasets/biogpt-large" --dataset_path "/bio-gpt/raw/train.tsv,/bio-gpt/raw/valid.tsv,/bio-gpt/raw/ori_pqaa_rrp.json" --dataset_concatenation --gradient_accumulation_steps 1 --do_train --cache_dir "/biogpt_finetuned_model/cache/" --output_dir "/biogpt_finetuned_model/" --max_train_samples 5000 --per_device_train_batch_size 8 --learning_rate 4e-03 --num_train_epochs 3 --lora_alpha 64 --lora_target_modules q_proj, v_proj, k_proj, out_proj --lr_scheduler_type "linear" --use_cpu --use_ipex --ddp_backend ccl --ddp_find_unused_parameters True --bf16 --save_steps 2000'
Local batch size	8
Max train samples	10500
Peft Lora Alpha	64
Peft Lora Modules	Q_proj, k_proj, v_proj, out_proj
Sequence length	512
Learning rate	0.004
Epochs	3
Accumulation steps	1
Num ranks	2 per node