🚧

Bulkhead

ReliabilityA pattern that isolates resource pools to contain failure spread

Bulkhead separates resource pools so one overloaded path or dependency cannot consume all shared capacity. Like ship compartments, the goal is not to prevent every local failure, but to stop that failure from flooding the whole service.

▶Architecture Diagram

🔍 Structure

🌊Incoming Load

🚧Critical Pool

🚧Standard Pool

💳Payment Flow

📊Report Job

📈Saturation Signal

Dashed line animations indicate the flow direction of data or requests

Why do you need it?

If every request shares the same thread pool, connection pool, or worker queue, the slowest dependency or noisiest workload can starve everything else. Then a non-critical feature can degrade the entire service. Shared capacity becomes the channel through which local trouble turns into global outage.

Why did this approach emerge?

As platforms relied more on external APIs, background jobs, and multi-tenant workloads, it became common for one slow dependency to tie up huge amounts of shared capacity. Simple shared-pool strategies stopped being good enough. Bulkhead became important because distributed systems needed local containment, not just global scaling.

How does it work inside?

Teams usually implement bulkheads by splitting worker pools, connection pools, queues, or concurrency limits by dependency, tenant class, or business priority. That way one pool can saturate while another still has room for critical traffic. The structure is about deciding what deserves protection and what should be allowed to degrade first.

Boundaries & Distinctions

Circuit Breaker opens or closes calls based on error and timeout behavior. Bulkhead separates capacity so one class of failure cannot exhaust all resources in the first place. Load Balancer spreads traffic across instances; Bulkhead protects the inside of a service or instance from internal capacity collapse.

When should you use it?

It is useful when critical user-facing flows coexist with lower-priority jobs, when dependencies have very different reliability, or when tenant isolation matters operationally. The tradeoff is that too many isolated pools can waste capacity and complicate tuning. Bulkheads work best when guided by real failure paths and saturation data.

Protecting critical paths from noisy background workPreventing one tenant's load from exhausting shared capacitySplitting thread, queue, or connection pools by dependency or priorityKeeping business-critical flows alive during partial saturation