Understand AI Engineering visually

Explore each concept's architecture through animated diagrams. Click a card to dive deeper.

Tokens & Context Window

The token budget a model can see in one call

Tokens are the small units a model reads, and the context window is the working space available in a single call. System instructions, the current question, earlier turns, retrieved documents, tool results, and even the answer being generated all have to fit inside that same space.

✍️

Prompt Engineering

Designing instructions that steer model behavior

Prompt engineering is the work of defining what the model should do, what standards it should follow, and what style or format it should keep. It is less about clever phrasing and more about making the task, criteria, and boundaries unambiguous.

🧩

Context Engineering

Designing which information enters the current call

Context engineering is the design discipline of selecting and assembling only the information the current call needs. When retrieved documents, chat history, memory, and tool results matter more than the wording of one prompt, this becomes the central layer.

📐

Structured Output

Receiving results through a schema instead of free text

Structured output means asking the model for results that fit a schema or object shape instead of unconstrained prose. The goal is not just readability for humans. It is predictable consumption by code.

🛠️

Tool Use

Letting the model call external systems for data or actions

Tool use is the pattern where the model expresses an intention to call an external function or API, and the application runtime executes that call on the model's behalf. The model decides what needs to be queried or executed. The runtime keeps control over side effects.

📊

Embeddings

Representing meaning as vectors

Embeddings convert text into numeric vectors so systems can compare meaning rather than just exact words. If two passages mean similar things, their vectors should end up closer together in the embedding space.

🗂️

Chunking & Indexing

Preparing documents as searchable chunks and indexes

Chunking and indexing are the preparation steps that turn raw documents into retrievable units and store them so search can find them later. If RAG is the runtime pattern, chunking and indexing are the offline data layer that RAG depends on.

📚

RAG

Retrieving external knowledge to ground generation

RAG retrieves external knowledge at request time, places that evidence into the current context, and uses it to ground the answer. Instead of relying only on what the model already contains internally, it connects the answer to relevant outside material.

🤖

Agent workflow

A model-tool loop for solving goals over multiple steps

Agent workflow is the execution pattern where a model moves toward a goal by repeatedly choosing actions, observing results, and deciding what to do next. The important idea is not the label agent itself. It is the presence of state, iteration, and step-by-step control.

🧠

Memory

A persistence layer for carrying state across calls

Memory is the layer that stores information worth carrying beyond a single context window. Typical examples include user preferences, ongoing task state, prior decisions, and summarized conversation history that still matters later.

🧪

LLM Evals

A repeatable testing system for LLM quality

Evals are the repeatable measurement system used to compare LLM behavior over time. They make it possible to tell whether prompt, model, retrieval, or workflow changes actually improved the system or just moved failures around.

🛡️

Guardrails

Runtime controls for allowed behavior and safe fallbacks

Guardrails are the runtime controls that define what the system may accept, do, and return. They are less about asking the model to behave well and more about enforcing boundaries around actual inputs, actions, and outputs.

🔍

LLM Observability

Operational tracing for explaining LLM system behavior

Observability is the operational tracing layer that shows what actually happened inside an LLM system. Instead of looking only at the final answer, it connects the request, prompt snapshot, retrieval results, tool calls, validations, output, and user feedback into one explainable trace.