Amazon Load balancing issues
As we move a greater number of services towards the cloud (Amazon Web Services) we pioneer our way into an brand new universe of opportunity. We also come across technical issues for which no documentation exists on the internet (we know sounds a bit like Star Trek and Lucasfilm). So we are learning every day, which makes this process very enjoyable.
To give an example, take amazon’s Load Balance service (ELB).
We moved some of our script-serving traffic to an auto-scaling load balancer and watched the results with all types of monitoring tools. The response time is fantastic! On their instances, the scripts are served in 0.15 seconds, our previous provider (one of the largest of the US / world) served scripts in 0.20 seconds. Thats a 25% speed increase !!
There is, however, a downside: Once in a while it takes 1.5 seconds to load a script. We are currently investigating with amazon why this is happening, the explanation appears related to keeping tomcat conncections alive.
There is no reason for us to keep connections alive on a stand-alone server, but with the instances talking to the load balancer only, keeping the connections open could be a another speed increase. Our main concern of course is random spike in loading time, but we’re confident to figure it out pretty soon.
after some conversation with Amazon techies it seems we have resolved the issues.
Here’s some things they they wrote: ”
The load balancer tries to reuse persistent connections if the instance speaks HTTP/1.1. In this case, the load balancer may end up using a stale connection because the instance did not send the “Connection: close” header. This race condition may also have something to do with the non-2xx responses.
For HTTP load balancing, the keep-alive setting is a per-connection setting. What I mean by this is, When you change the setting on your web server, it affects the connection between ELB and your server. It does not have any affect on the connection from the client to ELB. So, although you may only expect your ultimate clients to each make single infrequent requests, ELB ends up concentrating requests from many clients over a smaller number of connections if you allow connection persistence on your server. This will offload much of the connection work from your server and also make better use of the available bandwidth. “
So We did some more test with keep-alive but nothing seemes to really help
Eventually we set our
as that gave us the best performance.
from tomcat documentation:
The maximum number of HTTP requests which can be pipelined until the connection is closed by the server. Setting this attribute to 1 will disable HTTP/1.0 keep-alive, as well as HTTP/1.1 keep-alive and pipelining. Setting this to -1 will allow an unlimited amount of pipelined or keep-alive HTTP requests. If not specified, this attribute is set to 100.
One thing to mention though is that after i gave it some weeks of rest, and accepting the random lag, i did the test again and almost all the lag is gone !!
Not sure if amazon changed anything or that the loadbalancer ‘learned’ how much connections were needed. but you dont hear me complaining.