Conceptly

Understand AI Engineering visually

Explore each concept's architecture through animated diagrams. Click a card to dive deeper.

๐Ÿ“Input๐ŸงฎToken Budget๐Ÿ’ฌReply
๐Ÿงฎ

Tokens & Context Window

The token budget a model can see in one call

Tokens are the small units a model reads, and the context window is the working space available in a single call. System instructions, the current question, earlier turns, retrieved documents, tool results, and even the answer being generated all have to fit inside that same space.

๐ŸŽฏTaskโœ๏ธPrompt๐Ÿค–Behavior
โœ๏ธ

Prompt Engineering

Designing instructions that steer model behavior

Prompt engineering is the work of defining what the model should do, what standards it should follow, and what style or format it should keep. It is less about clever phrasing and more about making the task, criteria, and boundaries unambiguous.

๐Ÿ—‚๏ธSources๐ŸงฉContext Builder๐Ÿ“ฆFinal Context
๐Ÿงฉ

Context Engineering

Designing which information enters the current call

Context engineering is the design discipline of selecting and assembling only the information the current call needs. When retrieved documents, chat history, memory, and tool results matter more than the wording of one prompt, this becomes the central layer.

๐ŸŽฏTask๐Ÿ“Schema๐ŸงพJSON
๐Ÿ“

Structured Output

Receiving results through a schema instead of free text

Structured output means asking the model for results that fit a schema or object shape instead of unconstrained prose. The goal is not just readability for humans. It is predictable consumption by code.

๐ŸŽฏGoal๐Ÿ› ๏ธTool Call๐Ÿ—„๏ธSystem
๐Ÿ› ๏ธ

Tool Use

Letting the model call external systems for data or actions

Tool use is the pattern where the model expresses an intention to call an external function or API, and the application runtime executes that call on the model's behalf. The model decides what needs to be queried or executed. The runtime keeps control over side effects.

๐Ÿ“„Text๐Ÿ“ŠVector๐ŸŽฏSimilar Match
๐Ÿ“Š

Embeddings

Representing meaning as vectors

Embeddings convert text into numeric vectors so systems can compare meaning rather than just exact words. If two passages mean similar things, their vectors should end up closer together in the embedding space.

๐Ÿ“„Source Doc๐ŸงฉChunks๐Ÿ—‚๏ธIndex
๐Ÿ—‚๏ธ

Chunking & Indexing

Preparing documents as searchable chunks and indexes

Chunking and indexing are the preparation steps that turn raw documents into retrievable units and store them so search can find them later. If RAG is the runtime pattern, chunking and indexing are the offline data layer that RAG depends on.

โ“Question๐Ÿ“šRetrieve๐Ÿ’ฌGrounded Answer
๐Ÿ“š

RAG

Retrieving external knowledge to ground generation

RAG retrieves external knowledge at request time, places that evidence into the current context, and uses it to ground the answer. Instead of relying only on what the model already contains internally, it connects the answer to relevant outside material.

๐ŸŽฏGoal๐Ÿค–Decision Loopโœ…Outcome
๐Ÿค–

Agent workflow

A model-tool loop for solving goals over multiple steps

Agent workflow is the execution pattern where a model moves toward a goal by repeatedly choosing actions, observing results, and deciding what to do next. The important idea is not the label agent itself. It is the presence of state, iteration, and step-by-step control.

๐Ÿ’ฌSession๐Ÿง Memory Storeโžก๏ธNext Turn
๐Ÿง 

Memory

A persistence layer for carrying state across calls

Memory is the layer that stores information worth carrying beyond a single context window. Typical examples include user preferences, ongoing task state, prior decisions, and summarized conversation history that still matters later.

๐Ÿ—ƒ๏ธDataset๐ŸงชJudge๐Ÿ“ˆScore
๐Ÿงช

LLM Evals

A repeatable testing system for LLM quality

Evals are the repeatable measurement system used to compare LLM behavior over time. They make it possible to tell whether prompt, model, retrieval, or workflow changes actually improved the system or just moved failures around.

๐Ÿ“ฅInput๐Ÿ›ก๏ธPolicy Gate๐Ÿ“คSafe Output
๐Ÿ›ก๏ธ

Guardrails

Runtime controls for allowed behavior and safe fallbacks

Guardrails are the runtime controls that define what the system may accept, do, and return. They are less about asking the model to behave well and more about enforcing boundaries around actual inputs, actions, and outputs.

๐Ÿ“จRequest๐Ÿ”Trace๐Ÿ“ˆInsight
๐Ÿ”

LLM Observability

Operational tracing for explaining LLM system behavior

Observability is the operational tracing layer that shows what actually happened inside an LLM system. Instead of looking only at the final answer, it connects the request, prompt snapshot, retrieval results, tool calls, validations, output, and user feedback into one explainable trace.