Home > AI Solutions > Gen AI > Guides > Generative AI in the Enterprise with AMD Accelerators > RAG validation
Retrieval-Augmented Generation (RAG) is a method that allows the use of custom data to augment the context of LLMs, shifting away from sole reliance on static training datasets. It addresses challenges such as LLMs' lack of knowledge about specific data, their tendency to provide outdated answers or hallucinate, and the inefficiency and high costs associated with training and fine-tuning. RAG is particularly useful for applications such as Q&A chatbots, search augmentation, and knowledge engines that ask questions on your data. The benefits of RAG include providing up-to-date and accurate responses, reducing inaccuracies and hallucinations, offering domain-specific responses, and being a cost-effective approach to customizing LLMs.
The following steps show how to deploy a simple RAG application using LlamaIndex:
pip install llama-index-core llama-index-readers-file llama-index-llms-ollama llama-index-embeddings-huggingface
curl -fsSL https://ollama.com/install.sh | sh
ollama pull llama3
ollama pull nomic-embed-text
pip install llama-index-llms-ollama
pip install llama-index-embeddings-ollama
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.embeddings.ollama import OllamaEmbedding
from llama_index.llms.ollama import Ollama
documents = SimpleDirectoryReader("data").load_data()
# nomic embedding model
Settings.embed_model = OllamaEmbedding(model_name="nomic-embed-text")
# ollama
Settings.llm = Ollama(model="llama3", request_timeout=360.0)
index = VectorStoreIndex.from_documents(
documents,
)
query_engine = index.as_query_engine()
response = query_engine.query("What did the author do growing up?")
print(response)
$ python starter.py
The author grew up writing short stories and programming on an IBM 1401 in his junior high school. He also worked on a TRS-80 microcomputer in about 1980, which he convinced his father to buy him after years of nagging.