Memory
Memory is the layer that stores information worth carrying beyond a single context window. Typical examples include user preferences, ongoing task state, prior decisions, and summarized conversation history that still matters later.
βΆArchitecture Diagram
π Data FlowDashed line animations indicate the flow direction of data or requests
Replaying all prior messages on every turn becomes expensive and noisy as sessions grow. But storing nothing makes the system behave like it has never seen the user or task before. Long-running assistants need something between full replay and total forgetting.
Early chat assistants leaned heavily on resending recent conversation each turn. As sessions grew longer and task-oriented agents appeared, teams began separating per-call working context, the state needed in the current turn, from durable state that should survive across turns and sessions.
The system first decides which signals from the current interaction are worth persisting at all. Those signals are written into a memory store, often with different layers for user profile, session state, and task progress. Later calls retrieve only the memory relevant to the current question and insert it into working context.
Memory, RAG, and the context window all affect what past information reaches the answer, but they solve different problems. If user preferences or task state must survive across calls, use memory. If the system needs source documents for the current question, use RAG. If the conversation still fits inside one request, first ask whether the context window already solves it. Memory can become a second source of hallucination when stale guesses are treated like facts.
Teams use memory for personalization, long-running support, and multi-step agents. Poor memory design becomes a second source of hallucination when the system keeps stale assumptions or incorrect inferred facts. That is why write criteria, expiry rules, and user correction paths matter as much as retrieval.