📚

Retrieval-Augmented Generation

RetrievalRetrieving external knowledge to ground generation

RAG retrieves external knowledge at request time, places that evidence into the current context, and uses it to ground the answer. Instead of relying only on what the model already contains internally, it connects the answer to relevant outside material.

▶Architecture Diagram

🔄 Process

🧑User Query

✍️Query Rewrite

🔍Retriever

📚Top Chunks

📦Context Pack

🤖Model

💬Answer

Dashed line animations indicate the flow direction of data or requests

Why do you need it?

Model parameters alone are weak for fast-changing policies, private internal knowledge, or narrow operational details. When that outside evidence is missing, answers can sound fluent while still being outdated, ungrounded, or invented. The issue is often not that the model is unintelligent. It is that the right evidence never enters the call.

Why did this approach emerge?

Many teams realized quickly that retraining for every document change was unrealistic. Retrieval adapts to changing corpora much more easily and lets private content stay outside the model itself. That made RAG a default pattern for many knowledge-heavy LLM applications.

How does it work inside?

A typical flow rewrites or normalizes the question, retrieves candidate chunks, chooses the strongest evidence, packs it into the prompt, and then generates an answer from that evidence. Some systems add reranking to improve recall, the chance of not missing relevant evidence, before the final context packet. The important point is that quality comes from the whole retrieval-and-generation pipeline, not from the model in isolation.

Boundaries & Distinctions

RAG, memory, and tool use all bring outside information into the current answer, but they solve different problems. If each request needs source-of-truth documents, use RAG. If user or task state must persist across calls, use memory. If the task requires calculations or external actions, use tool use. Retrieved evidence alone does not automatically solve multi-step reasoning or execution.

When should you use it?

Teams use RAG for product documentation, internal knowledge assistants, support, and policy retrieval. But not every question is just document lookup. Once the task requires calculations, tool execution, or long-running state, RAG alone stops being sufficient and must work with tools, memory, and evaluation.

Product documentation Q&AInternal copilotsSupport assistantsPolicy lookup