Home > AI Solutions > Gen AI > White Papers > Maximizing AI Performance: A Deep Dive into Scalable Inferencing on Dell with NVIDIA > Use Case 5 - Impact of Running Models with Different Quantization on the XE9680
This use case focuses on understanding the impact of running models with different quantizations on the XE9680. Specifically, we tested the Llama 3 70B model using both FP16 and FP8 quantization to compare their performance metrics.
Goals:
Note: This paper does not detail model weights when using quantization. However, larger models provide more accurate responses.
Note: Unless otherwise stated, all use cases will utilize the default NIM quantization (FP8).