🧠

Memory

OrchestrationA persistence layer for carrying state across calls

Memory is the layer that stores information worth carrying beyond a single context window. Typical examples include user preferences, ongoing task state, prior decisions, and summarized conversation history that still matters later.

▶Architecture Diagram

📊 Data Flow

🧑User Turn

✂️Memory Extractor

🧠Memory Store

🔍Memory Retrieval

📦Working Context

🤖Model

Dashed line animations indicate the flow direction of data or requests

Why do you need it?

Replaying all prior messages on every turn becomes expensive and noisy as sessions grow. But storing nothing makes the system behave like it has never seen the user or task before. Long-running assistants need something between full replay and total forgetting.

Why did this approach emerge?

Early chat assistants leaned heavily on resending recent conversation each turn. As sessions grew longer and task-oriented agents appeared, teams began separating per-call working context, the state needed in the current turn, from durable state that should survive across turns and sessions.

How does it work inside?

The system first decides which signals from the current interaction are worth persisting at all. Those signals are written into a memory store, often with different layers for user profile, session state, and task progress. Later calls retrieve only the memory relevant to the current question and insert it into working context.

Boundaries & Distinctions

Memory, RAG, and the context window all affect what past information reaches the answer, but they solve different problems. If user preferences or task state must survive across calls, use memory. If the system needs source documents for the current question, use RAG. If the conversation still fits inside one request, first ask whether the context window already solves it. Memory can become a second source of hallucination when stale guesses are treated like facts.

When should you use it?

Teams use memory for personalization, long-running support, and multi-step agents. Poor memory design becomes a second source of hallucination when the system keeps stale assumptions or incorrect inferred facts. That is why write criteria, expiry rules, and user correction paths matter as much as retrieval.

Personalized assistantsLong-running supportTask agentsMulti-session workflows