chapters
Executive summary
Introduction
Experiment Setup
Download the Llama 2 Model
Deploy the model
Experiment Results
Conclusion
References
This document describes how to deploy and run inferencing on a Meta Llama 2 7B parameter model using a single NVIDIA A100 GPU with 40GB memory.