Introduction | Llama 2: Inferencing on a Single GPU | Dell Technologies Info Hub

Your Browser is Out of Date

Nytro.ai uses technology that works best in other browsers.
For a full experience use one of the browsers below

Introduction

Introduction

Thank you for your feedback!

Meta and Microsoft released Llama 2, an open source LLM, to the public for research and commercial use [1]. Llama 2 is an auto-regressive language model that uses an optimized transformer architecture. The fine-tuned versions use Supervised Fine-Tuning (SFT) and Reinforcement Learning with Human Feedback (RLHF) to align to human preferences for helpfulness and safety. It was pre-trained on 2 trillion pieces of data from publicly available sources.
This release includes model weights and starting code for pretrained and fine-tuned Llama 2 language models, ranging from 7B (billion) to 70B parameters (7B, 13B, 70B). The following table provides further detail about the models.
Table 1. Parameters and tokens for Llama 2 base and fine-tuned models

Models

Fine-tuned Models

Parameter

Llama 2-7B

Llama 2-7B-chat

7B

Llama 2-13B

Llama 2-13B-chat

13B

Llama 2-70B

Llama 2-70B-chat

70B

To run these models for inferencing, 7B model requires 1GPU, 13 B model requires 2 GPUs, and 70 B model requires 8 GPUs. The Model Parallel (MP) values are set while the model is being built [2].