Understand AI Engineering visually
Explore each concept's architecture through animated diagrams. Click a card to dive deeper.
Tokens & Context Window
The token budget a model can see in one call
Tokens are the small units a model reads, and the context window is the working space available in a single call. System instructions, the current question, earlier turns, retrieved documents, tool results, and even the answer being generated all have to fit inside that same space.
Prompt Engineering
Designing instructions that steer model behavior
Prompt engineering is the work of defining what the model should do, what standards it should follow, and what style or format it should keep. It is less about clever phrasing and more about making the task, criteria, and boundaries unambiguous.
Context Engineering
Designing which information enters the current call
Context engineering is the design discipline of selecting and assembling only the information the current call needs. When retrieved documents, chat history, memory, and tool results matter more than the wording of one prompt, this becomes the central layer.
Structured Output
Receiving results through a schema instead of free text
Structured output means asking the model for results that fit a schema or object shape instead of unconstrained prose. The goal is not just readability for humans. It is predictable consumption by code.
Tool Use
Letting the model call external systems for data or actions
Tool use is the pattern where the model expresses an intention to call an external function or API, and the application runtime executes that call on the model's behalf. The model decides what needs to be queried or executed. The runtime keeps control over side effects.
Embeddings
Representing meaning as vectors
Embeddings convert text into numeric vectors so systems can compare meaning rather than just exact words. If two passages mean similar things, their vectors should end up closer together in the embedding space.
Chunking & Indexing
Preparing documents as searchable chunks and indexes
Chunking and indexing are the preparation steps that turn raw documents into retrievable units and store them so search can find them later. If RAG is the runtime pattern, chunking and indexing are the offline data layer that RAG depends on.
RAG
Retrieving external knowledge to ground generation
RAG retrieves external knowledge at request time, places that evidence into the current context, and uses it to ground the answer. Instead of relying only on what the model already contains internally, it connects the answer to relevant outside material.
Agent workflow
A model-tool loop for solving goals over multiple steps
Agent workflow is the execution pattern where a model moves toward a goal by repeatedly choosing actions, observing results, and deciding what to do next. The important idea is not the label agent itself. It is the presence of state, iteration, and step-by-step control.
Memory
A persistence layer for carrying state across calls
Memory is the layer that stores information worth carrying beyond a single context window. Typical examples include user preferences, ongoing task state, prior decisions, and summarized conversation history that still matters later.
LLM Evals
A repeatable testing system for LLM quality
Evals are the repeatable measurement system used to compare LLM behavior over time. They make it possible to tell whether prompt, model, retrieval, or workflow changes actually improved the system or just moved failures around.
Guardrails
Runtime controls for allowed behavior and safe fallbacks
Guardrails are the runtime controls that define what the system may accept, do, and return. They are less about asking the model to behave well and more about enforcing boundaries around actual inputs, actions, and outputs.
LLM Observability
Operational tracing for explaining LLM system behavior
Observability is the operational tracing layer that shows what actually happened inside an LLM system. Instead of looking only at the final answer, it connects the request, prompt snapshot, retrieval results, tool calls, validations, output, and user feedback into one explainable trace.