Fixing an overloaded web server

This was a work-in-progress circa 1997. I was just learning how to tune web servers.

For a smattering of more up to date stuff about web performance, see my pages at www.kegel.com.

Links

The problem

My company has deployed several small web server machines- Pentium 200's with 64MB of RAM, each serviced by a 10Mbps Internet link, running Red Hat Linux 4.2 and its Apache. The web servers have severe performance problems for the first week or so after we post a new set of files for our users to download. I suspect we have something like 500 people trying to download 80 megabytes worth of files all at the same time. My boss wanted to replace the Linux system with a Solaris box in hopes the problem would just go away- and I knew that installing more RAM would probably help, but I didn't know how much RAM I'd need, and I wanted to understand the problem better.

Looking at the web benchmarks at www.spec.org, I noticed that all the highest scores were posted by machines running the Zeus web server software. Zeus is rumored to get part of its performance by careful program design, including using a single process to serve many clients. (The main design trick that enables this is to multiplex many clients onto a single thread.) It's not free, though.

A similar but free and much simpler package is thttpd from Acme Software. I decided to see first if thttpd was better than Apache under heavy overload conditions, and second how much RAM was needed to make Apache happy.

Experiment design

To build a test lab in my office to simulate the situtation, I upgraded three old P90's to generic Pentium 200 MMX's with a 6GB IDE disk, and installed stock Red Hat Linux 5.0. Total cost to upgrade each system: $450 for motherboard, CPU, and disk. Connected them all with a 10baseT hub.

Chose one machine as the server. Configured it with 32MB RAM, and installed both Apache (the one that came with Red Hat 5) and thttpd on it.

The remaining two machines were used as clients. One was equipped with 32 MB RAM, the other with 64MB. Webstone 2.0.1 was installed, and configured to simulate 242 simultaneous users pounding on the server constantly.

Our servers actually serve a mix of 10KB, 1MB, 10MB, and 60MB files; most of the load is probably from the larger files. Rather than using a realistic filesize mix, I used a set of 500 kilobyte files, and varied the number of files to approximate the desired total fileset size.

Observations

thttpd logs requests via syslog, but to get the messages to show up properly in /var/log/messages under Red Hat Linux 5.0, I had to edit /etc/rc.d/init.d/syslog and add the -r parameter to the 'daemon syslog' line.

thttpd version 1.95 and earlier rely on alarm() to provide a timeout on the reading of the http headers. When the SIGALRM comes in, it interrupts the read() call. This works fine, except when it doesn't :-) due to an OS bug or misconfiguration. To see if your installation of thttpd has this problem, telnet to your server on the port where thttpd is running, and just sit there; it should time out after 5 seconds, and it should do this every time you try. Possible bugs include timeout not working at all, or working correctly only the first time; when it doesn't work, thttpd freezes until you quit telnet, and won't serve any documents. On Red Hat Linux 4.2 (kernel 2.0.30), which has a bug in signal processing, thttpd will never time out a stuck connection. You should not use thttpd 1.95 or earlier with Linux older than 2.0.32 for anything but casual testing, or you'll find it getting stuck periodically. (thttpd 2.0 should solve this problem.)
If you have this problem, you can test your Unix's signal behavior with this test code to help figure out what's wrong.

I did not make sure the clients were adequately equipped with RAM; however, they did provide a heavy enough load to saturate the 10baseT LAN. A real benchmark would have required 100baseT and more client systems with more RAM.

Banga and Druschel claim that benchmarks like Webstone leave much to be desired. They're right, but I'm using Webstone anyway, since it was all I had handy.

Webstone had to be patched to run under Linux. Also, it often complained that the master had received a SIGINT when it really hadn't; this problem is mentioned in the Webstone mailing list archives. This happened often at first, but went away, perhaps when I added more RAM to the systems running Webstone.

thttpd uses a single process to handle all requests. Red Hat Linux 5.0 is based on Linux 2.0.32, which has a per-process limit of 256 file descriptors. This puts an upper limit on the number of simultaneous clients that thttpd can serve. A patch is available to increase this limit to 1024, at the cost of increasing the per-process memory consumption by about 50 kilobytes. I have not yet tried this patch, nor have I tried going beyond 242 clients yet. In my application, I could work around it by running several copies of thttpd.

Linux 2.0.32 also limits the sum of open file descriptors in all processes to 1024; this is raised to 2048 by the above patch.

Several values in /etc/httpd/conf/httpd.conf should be set before running a benchmark. The ones I've noticed so far are MaxClients, StartServers, and MaxSpareServers, which should all be set to the expected number of clients or as high as system RAM allows, whichever is less. (Although see the Apache performance notes, which mention that newer versions of Apache require less fiddling to get good benchmark behavior.)

I initially used a cheapo clone Intel Ethernet Express card on one of the systems, but after about ten minutes of heavy load, the Red Hat 5.0 driver for the card always printed an error message and refused to deliver any more packets. Installing a 3com 3c905 resolved the problem.

Webstone has a test parameter that sets the length in minutes of the benchmark run, but it does not interrupt clients until they have finished downloading the current file. The error shows up in the server and client throughput measurements, which count the bytes that are transferred after the end of the test period, causing the throughput measurements to be noticably higher than the true values. With a runtime of 1 minute, and maximum download times of 20 seconds, the reported bandwidth was greater than the physical bandwidth of the network interface. Run time should be set to something like 100 times the maximum file download time if you want to trust the numbers Webstone prints out. I did not need those numbers, so I didn't worry about it much.

Results

Server equipped with 32MB of RAM

I used vmstat on the web server to measure whether disk i/o was going on as I varied the number of files being served. Apache started doing disk i/o when the fileset reached 3 megabytes; thttpd didn't start doing disk i/o until the fileset reached 13 megabytes.

Both apache and thttpd seemed to reach about the same throughput even when serving up to 32 megabytes of files. At this load, Apache was doing much more disk i/o than thttpd, but both came close to saturating the 10baseT.

During the 32 megabyte fileset test, I used a fourth computer to fetch a small html file from the web server. thttpd served up the file in five seconds; Apache took over a minute.

The above results for Apache were with the default setting of max server processes = 150 and initial server processes = 20. When Apache was tuned to start 150 or more server processes, the whole system became unresponsive as soon as the benchmark was started. When Apache was tuned to start 100 server processes, it ran sluggishly, but at least it ran.

Server equipped with 64MB of RAM

I tried upgrading the web server to 64 megabytes of RAM and repeating the 32 megabyte fileset test. Apache still did considerable disk i/o throughout the four-minute test even with 64 megabytes of RAM. According to Apache's performance notes, this means that 64MB of RAM is not really enough for Apache in this situation.

After Apache was tuned to allow 250 server processes, it responded just as quickly as thttpd when a fourth computer was used to fetch a small html file from the web server.

With 64MB of RAM, thttpd did zero disk i/o after a couple minutes, so 64MB is more than enough for thttpd in this situation. I think this means that if the web server had 64MB of RAM and a 100Mbps Ethernet card, thttpd would be able to use up much more of the available bandwidth than Apache.

Conclusions

The overload problem our servers experience is probably due to disk thrashing from lack of RAM.
Installing more RAM and replacing Apache with thttpd both seemed to be good ways to solve the disk overload. In either case, learning how to tune the operating system and/or web server software for good performance under heavy load required several days.
I found building a test setup with Webstone and a few spare Pentiums running Red Hat Linux quite helpful in learning how to tune a web server for better performance.
I have not yet found good guidelines for how much RAM is required to run web servers, but I suspect under Linux it is something like 8MB for the OS + 256k per httpd process + 64k per TCP connection + enough RAM to hold the entire active document tree. So the RAM needed for a web server with an 8MB active document tree and 250 simultaneous clients would be about 32MB for thttpd and 64MB for Apache.
http://alumni.caltech.edu/~dank/fixing-overloaded-web-server.html
Dan Kegel