Home > Servers > Rack and Tower Servers > Intel > White Papers > Driving GenAI Advancements: Dell PowerEdge R760 with the Latest 5th Gen Intel® Xeon® Scalable Processors > Distributed inferencing
Workload | Meta LLAMA-2-7B,13B, Falcon-40B model BF16 precision |
Application | IPEX intel_extension_for_pytorch==2.1.0 |
Tools/Compilers | gcc=12.2.1 |
Middleware, Framework, Runtimes | cmake-3.20.2, findutils-4.6.0, bzip2-1.0.6, gcc-8.5.0, gcc-c++-8.5.0, gcc-toolset-12-12.0, gcc-toolset-12-runtime-12.0, git-2.39.3, gperftools-devel-2.7-9.el8, libatomic-8.5.0, libfabric-1.18.0, procps-ng-3.3.15, python3-distutils-extra-2.39, python39-3.9.18,python39-devel-3.9.18, python39-pip-20.2.4,unzip-6.0, wget-1.19.5,which-2.21, intel-oneapi-openmp-2023.2.1, PSM3 https://downloadmirror.intel.com/789689/IntelEth-FS.RHEL88-x86_64.11.5.1.1.1.tgz, ninja==1.11.1.1, accelerate==0.25.0, sentencepiece==0.1.99, protobuf==4.25.1, datasets==2.15.0, transformers==4.31.0, wheel==0.42.0, PyTorch 2.1 - torch==2.1.0, IPEX - intel_extension_for_pytorch==2.1.0, neural-compressor==2.3.1, TorchCCL --branch v2.1.0+cpu https://github.com/intel/torch-ccl, mpi4py==3.1.4, Deepspeed --branch gma/run-opt-branch https://github.com/delock/DeepSpeedSYCLSupport |
Orchestration | Kubernetes v1.27.5 |
Command line | mpirun -n 2 -ppn 1 -iface net1 -genv OMP_NUM_THREADS=31 -genv MASTER_ADDR=$MASTER_ADDR -genv MASTER_PORT=$MASTER_PORT -genv LD_PRELOAD=/usr/lib64/libstdc++.so.6:/usr/lib64/libtcmalloc.so:/opt/intel/oneapi/compiler/2023.2.1/linux/compiler/lib/intel64_lin/libiomp5.so -genv TCMALLOC_LARGE_ALLOC_REPORT_THRESHOLD=4294967296 -f /machinefile python /datasets/run_generation_with_deepspeed.xmr.py --benchmark -m $MODEL_NAME --dtype bfloat16 --ipex --deployment-mode --token-latency --batch-size (1,2,4,8) --input-tokens (256,1024,2048) --num-iter 100 --num-warmup 10 --greedy --max-new-tokens 256 |
Warm up steps | 10 |
Steps | 100 |
Batch size | 1, 2, 4, 8 |
Beam width | 1 (greedy search) |
Input token size | 256, 1024, 2048 |
Output token size | 256 |