⚖️

Load Balancer

IntegrationA layer that spreads traffic across equivalent instances

Load Balancer is the traffic-distribution layer that sits in front of multiple equivalent service instances. Clients hit one stable address, while the balancer decides which healthy instance should receive each request. Its deeper value is not just sharing load, but hiding instance churn and failure from callers.

▶Architecture Diagram

🔗 Relationship

🌐Client

⚖️Load Balancer

🩺Health Probe

🖥️Instance A

🖥️Instance B

🖥️Instance C

Dashed line animations indicate the flow direction of data or requests

Why do you need it?

With only one server instance, both performance bottlenecks and outages concentrate in one place. Even if you add more instances, clients should not need to track and choose among all of them directly. Once you operate many instances, you need a shared layer that can select healthy targets and absorb change.

Why did this approach emerge?

As horizontal scaling, autoscaling, and multi-zone deployments became normal, direct client-to-instance coupling stopped being workable. Instances now appear, disappear, and get replaced regularly. Load balancers became foundational because systems needed a stable front while compute topology kept changing underneath.

How does it work inside?

Requests arrive at the load balancer, which checks the target pool and forwards traffic to healthy instances according to its policy. Health probes remove failing targets from consideration, and the target set can change without affecting the public entry point. Some load balancers stay close to pure transport, while others can route based on richer HTTP information.

Boundaries & Distinctions

Load Balancer answers which instance of the same role should handle this request. API Gateway answers which service and policy path the request should follow. Service Discovery helps find live service locations; Load Balancer chooses among them. They solve related but different layers of routing.

When should you use it?

Load balancers are standard in web APIs, internal service pools, and worker fleets with many equivalent instances. They are especially important for zero-downtime rollout and high availability. But if state is trapped inside instance memory, traffic distribution alone is not enough; the state model has to be reconsidered too.

Horizontally scaled web servicesRolling and zero-downtime deploymentsAutomatic removal of unhealthy instancesHigh-availability systems spanning multiple zones