Conceptly
← All Concepts
🧠

Caching

DataA technique that keeps nearby copies of frequently read data

Caching places a copy of frequently read data closer to the read path so requests do not have to reach the primary store every time. The original data stays where it is; a faster intermediate layer intercepts repeated lookups before they hit the source. Because every copy has a limited shelf life, the technique comes with decisions about how long to trust a cached value and when to throw it away.

β–ΆArchitecture Diagram

πŸ” Structure

Dashed line animations indicate the flow direction of data or requests

Why do you need it?

If the same data is repeatedly fetched from the primary store, latency rises and backend load climbs quickly. Popular read paths can start overwhelming the source of truth. Simply scaling the primary store is not always the cleanest or cheapest answer. Caching exists to shorten the read path by keeping a temporary copy nearby.

Why did this approach emerge?

As web systems grew, it became obvious that a huge share of traffic was read-heavy and highly repetitive. Scaling only the primary data store did not always solve both cost and latency. That pressure made cache layers a standard tactic for protecting primary systems and making common reads cheaper and faster.

How does it work inside?

Applications usually check the cache first and only hit the primary store when the value is missing or stale. When a miss occurs, the application reads from the source and fills the cache for the next request. The core idea is not replacing the primary store, but inserting a faster shortcut in front of repeated reads.

What is it often confused with?

Caching and CQRS can both improve read performance, but they operate at different levels. Caching keeps a nearby copy while preserving the same underlying model. CQRS changes the architecture more deeply by separating read and write models altogether. Caching is a shortcut; CQRS is a structural split.

When should you use it?

Caching is effective for product reads, configuration lookup, summaries, and other access patterns where the same data is requested repeatedly and slight staleness is acceptable. The important question is not just where to cache, but how long to trust cached data and when to invalidate it. Speed gains come with freshness tradeoffs that must be designed explicitly.

APIs and web services with repeated read trafficProduct lists or configuration data with high read frequencySystems whose primary store becomes a read bottleneckExperiences where response latency matters strongly