A-Z of Machine Learning and Computer Vision Terms

RAG stands for Retrieval-Augmented Generation, an approach that combines a retrieval system with a generative AI model. A RAG architecture refers to the design of systems that implement this approach, typically consisting of two main components: a retriever and a generator.The retriever (often a vector search over a knowledge base or documents) is responsible for fetching relevant information – for example, pulling the top-k text passages from a company’s document repository that relate to a user’s query. The generator is a large language model (LLM) or other generative model that then takes the query plus the retrieved documents as input and produces a final answer or output.By integrating these, RAG architecture grounds the generative model’s output in up-to-date or domain-specific knowledge. Essentially, instead of relying solely on what the LLM memorized during training, it can consult an external knowledge source on the fly. The architecture is powerful for applications like question-answering, where the LLM can cite specific retrieved facts, or any scenario where the knowledge cutoff of the model needs to be extended (for example, an LLM that was trained on data up to 2021 can use retrieval to answer questions about 2023). This setup helps reduce hallucinations and improve factual accuracy.In summary, a RAG architecture is a pipeline where a query first goes through a retrieval step to gather evidence, and then a generative step that uses that evidence to compose a context-aware, informed response.It marries information retrieval with text generation, enabling more reliable and context-rich AI systems.