⚖️

Load Balancer

InfrastructureThe backbone of traffic distribution and high availability

A load balancer distributes incoming network traffic across multiple servers, preventing any single server from being overloaded and providing high availability. Health checks automatically remove failed servers, while clients see a single stable entry point.

▶Architecture Diagram

🔗 Relationship

👥Users

⚖️Load Balancer

💓Health Check

🖥️Server 1

🖥️Server 2

🖥️Server 3

❌Server 4

Dashed line animations indicate the flow direction of data or requests

Why do you need it?

As soon as you scale a service beyond one server, clients lose the ability to know which backend instance they should reach. One server may be overloaded while another sits mostly idle, and if traffic continues flowing to a failed node, users experience the whole service as unstable even when some replicas are healthy. You could try pushing backend awareness into every client, but that means every topology change becomes a client update problem. What you really need is a stable front door that stays constant while the backend pool changes behind it. A load balancer provides that front door and turns multiple servers into one service entry point.

Why did this approach emerge?

When a single machine became the bottleneck, the first answer was often to buy a larger machine. But vertical scaling is expensive, finite, and fragile because one machine still represents one failure domain. As systems moved toward horizontal scaling, the operational question shifted from 'how do we make one server bigger?' to 'how do we spread work across many servers safely?' Load balancers emerged as the infrastructure answer to that question. They let applications scale out without teaching every client about every backend. Modern cloud and container platforms assume this pattern because multi-instance services are now a baseline operational model rather than an exceptional one.

How does it work inside?

To clients, a load balancer usually looks like a single IP or domain name. Behind that simple entry point, it maintains a pool of backend targets and decides where each connection or request should go. Health checks continuously test whether backends are alive, allowing failed instances to be removed automatically instead of silently absorbing traffic. At Layer 4, balancing decisions are made from transport-level information such as IPs and ports; at Layer 7, the balancer can inspect HTTP details like hostnames, paths, cookies, or headers and route more intelligently. That is why load balancers often end up doing more than simple distribution: they can also centralize TLS termination, shape rollout traffic, and enforce request-routing policy.

Boundaries & Distinctions

L4 and L7 load balancers both divide traffic across multiple targets, but they differ in what they understand about that traffic. L4 balancing is faster and more general because it only needs transport-level information, making it a good fit when you simply need stable distribution for connections. L7 balancing understands application-layer details, which enables routing based on URL paths, hostnames, cookies, and other request attributes. That extra intelligence makes it better for modern web and API architectures, but it also brings more complexity and policy surface area. So the choice is not really 'basic versus advanced'; it is whether the routing problem lives at the connection layer or in the application semantics above it.

When should you use it?

Load balancers are a natural fit for web services, APIs, container platforms, zero-downtime deployments, and any backend where multiple instances should behave like one service. They become especially valuable when you need health-based failover, centralized TLS handling, or controlled traffic shifting between versions. They are less compelling when a single host is plenty and the added moving parts would only increase complexity. Persistent long-lived connections and stateful session designs also require more careful balancing policy so traffic does not behave unexpectedly. A load balancer adds the most value when the operational goal is not merely to have more servers, but to make many servers feel like one reliable service.

Web service ingress -- distributing HTTP/HTTPS traffic across multiple serversZero-downtime deployments -- switching traffic during rolling or blue/green deploysAutomatic failover -- rerouting to healthy servers when one goes downMicroservices -- path-based routing to separate traffic by service