Protect Servers with HAProxy Connection Limits and Queues

By Nick Ramirez

HAProxy connection limits and queues can help protect your servers and boost throughput when load balancing heavy amounts of traffic.
When you use HAProxy as an API gateway in front of your services, it has the ability to protect those servers from traffic spikes. By utilizing connection limits and queues, you can ensure traffic flows through your network at an even pace.

In this article, we look at how you can use maximum connection limits and queues in HAProxy to control how much traffic reaches your servers. By controlling the volume of traffic at the load balancer, you can even increase the throughput of your servers—an idea known as queue-based load leveling—because the servers run under optimal conditions. We’ll use a simple analogy to make things easy to understand: shopping at the supermarket!

Store-wide Maximum Occupancy

Imagine you’re on a break from work and you rush to the supermarket to find some lunch. When you arrive, you see a sign posted next to the door: “Maximum Occupancy: 1000”. All buildings have a maximum occupancy limit—a total number of people allowed inside at the same time. This makes the building safer, aiding a quick evacuation in case of a fire. Anyone arriving once the building is full will need to wait outside until someone comes out. Yes, in our imaginary world, everyone follows the rules!

HAProxy has a maximum occupancy limit too: the total number of connections allowed, process-wide. This stops the process from accepting too many connections at once, which safeguards it from running out of memory. You set it with the maxconn directive in the global section of your configuration:

Here, HAProxy will accept up to 60,000 TCP connections concurrently. Having this limit prevents denial-of-service scenarios that could happen if HAProxy had to maintain more connections than the server’s memory resources allowed. When calculating the amount of memory you’ll need, keep in mind that HAProxy will use two file descriptors—otherwise known as network sockets—for every incoming connection, since it needs to open a connection to the backend server too. With each end using about 16 KB of memory, plan for 32 KB per established connection.

Did You Know? When the global maxconn is not set, it defaults to the number reported by the Linux command ulimit -n, which typically returns 1024 and is too low even for moderate loads.

Connections that arrive after the limit is reached queue up in the kernel’s socket queue until a connection slot in HAProxy becomes free. So, there’s a good chance that even when overloaded, HAProxy will be able to pull excess clients out of the queue so quickly that they never even notice.

Department Maximum Occupancy

After entering the store, you see giant signs lining the walls: Meat & Seafood, Bakery, and Deli. Deli—that’s where you’re headed. In this store, each department has its own occupancy limit. For example, no more than 100 people should be crowded into the Deli section at once.

Like how a supermarket is made up of departments, an HAProxy load balancer separates services into individual frontend sections of the configuration. For example, you might have one frontend that receives traffic for your website, another for your database service, and another for your API. They each listen on a different TCP port and define their own maximum connection limit:

Here, each frontend allows up to 20,000 connections to be active at once. Having these limits prevents any one frontend from hogging all of the available connection slots and starving the others. When this limit is reached, connections to that frontend queue up in the kernel’s socket queue, the same as with the global maxconn setting.

Checkout / Processing

You’ve found the deli, you’ve grabbed some ready-made food, and you head towards the cashiers to purchase your items. In our supermarket, each department has its own checkout counter where you can purchase your items.

This is similar to how a frontend in HAProxy sends connections to a backend to do the application-specific processing. A backend is a pool of servers—for example, web servers—that respond to clients. In our previous example, we used the default_backend directive in each frontend section to send clients to the correct backend. A backend and frontend pair looks like this:

In a backend section, you define one or more servers to process requests. HAProxy load balances connections or requests across them.

In our imaginary supermarket, servers are analogous to cashier lanes. The deli’s checkout counter (aka backend) may process multiple orders at once depending on how many cashier lanes (aka servers) are available. In HAProxy, you can add more servers to handle more concurrent connections.

What’s more, each server can process multiple connections at once. However, they have their limits. If you send too many connections to a server, you might deplete its memory or saturate its CPU. For that reason, you can set a maxconn parameter on each server line to limit how many connections HAProxy will send to that server:

In this example, HAProxy allows 30 active connections per server. If all of the servers are maxed out, connections queue up, waiting for an available server.

Did You Know? You can use a default-server directive to set default parameters that will apply to all server lines in the same section.

A special behavior applies when load balancing HTTP traffic. When you load balance HTTP, you set mode http in the backend to access HAProxy’s HTTP-specific features. In that case, the maxconn parameter on a server line no longer relates to the number of concurrent connections. Instead, it relates to the number of concurrent HTTP requests. But, in general, the same sort of logic applies. When you set mode tcp in a backend, which is the default, the setting relates to TCP connections. The two modes will never be used together.

Queues

What happens when all cashiers are busy processing orders? A queue begins to form, right? Then when a cashier becomes free, they signal to you that you are next.

Similarly, in HAProxy when all of the servers are processing their maximum number of requests, incoming requests queue up in the backend. Then, each time a slot to a server opens up, one of the queued clients uses it. Ideally, clients don’t stay queued for long and servers get to operate within their optimal range. Having an event-driven architecture, HAProxy can buffer a lot of active connections without exhausting its resources.

This method of protecting servers from overload has the effect of leveling out spikes in traffic. Servers receive a more uniform volume, which means they always operate within the bounds of their hardware specs, which in turn allows them to process requests efficiently and quickly. This is called queue-based load leveling, or as we often call it, server overload protection. Capping the number of concurrent requests sent to a server often results in higher throughput.

How long should a client wait in the queue though? You can set a maximum wait time by adding the timeout queue directive to your backend. In the following, updated example, a client will wait for up to 30 seconds in the queue, after which HAProxy returns a 503 Service Unavailable response to them:

Priority Customers

In HAProxy 1.9 and newer, you can prioritize HTTP requests that are waiting in the queue. Going back to our supermarket analogy, this is like giving some customers a pass that lets them cut to the head of the line. Use the http-request set-priority-class directive to tag connections as higher or lower priority.

In the following example, we define an ACL named is_checkout that checks whether the client has requested a URL path that begins with /checkout/. We prioritize those requests by setting the priority class to 1. Otherwise, we set it to 2:

The http-request set-priority-class directive assigns a priority number to a request. Lower numbers are given a higher priority. You can use any HAProxy ACL to form conditions for setting priorities. For example, you can base it on which type of resource the client is requesting, whether they’ve authenticated as a paying customer, or the type of API key they’re using (e.g. bronze, silver, gold).

Conclusion

In this blog post, you learned how to leverage limits and queues to safeguard HAProxy and your servers from overload. By controlling the volume of traffic at the load balancer, you can actually increase throughput! HAProxy also lets you assign a higher priority to some HTTP requests, which moves those requests to the front of the queue.

Want to stay up to date on similar topics? Subscribe to our blog! You can also follow us on Twitter and join the conversation on Slack.

Interested in advanced security and administrative features? HAProxy Enterprise is the world’s fastest and most widely used software load balancer. It powers modern application delivery at any scale and in any environment, providing the utmost performance, observability, and security. Organizations harness its cutting edge features and enterprise suite of add-ons, backed by authoritative expert support and professional services. Ready to learn more? Sign up for a free trial.