Home > AI Solutions > Gen AI > Guides > Generative AI in the Enterprise with AMD Accelerators > Retrieval augmented generation
Retrieval augmented generation (RAG) is a methodology designed to enhance the accuracy and reliability of AI interactions. It combines elements of both retrieval and generational models to improve the performance of natural language processing tasks, particularly in the domain of question answering and text generation.
RAG represents a significant advancement in the capabilities of LLMs by incorporating an external data retrieval step into the generative process. This innovative approach enables the dynamic extraction of pertinent information from an extensive corpus of data, including user-specific datasets. Therefore, it enriches the responses with accurate, current, and context-specific content. By facilitating access to and use of external knowledge bases, RAG fundamentally amplifies the utility and effectiveness of chatbots, rendering them more informative and versatile.
This approach underscores the importance of bringing AI to data rather than the other way around. RAG enhances the model's responses by allowing a generic LLM to access and use domain-specific data, making it more informative and versatile.
The computational requirements for RAG hinge on two aspects – the computational needs of the LLM and the retrieval portion. The inferencing process with LLM is similar to that of chatbots, albeit with typically larger input tokens. For information retrieval, several factors, including embedding models and vector databases, must be considered when sizing the computational infrastructure.