For a smattering of more up to date stuff about web performance, see my pages at www.kegel.com.
Looking at the web benchmarks at www.spec.org, I noticed that all the highest scores were posted by machines running the Zeus web server software. Zeus is rumored to get part of its performance by careful program design, including using a single process to serve many clients. (The main design trick that enables this is to multiplex many clients onto a single thread.) It's not free, though.
A similar but free and much simpler package is thttpd from Acme Software. I decided to see first if thttpd was better than Apache under heavy overload conditions, and second how much RAM was needed to make Apache happy.
Chose one machine as the server. Configured it with 32MB RAM, and installed both Apache (the one that came with Red Hat 5) and thttpd on it.
The remaining two machines were used as clients. One was equipped with 32 MB RAM, the other with 64MB. Webstone 2.0.1 was installed, and configured to simulate 242 simultaneous users pounding on the server constantly.
Our servers actually serve a mix of 10KB, 1MB, 10MB, and 60MB files; most of the load is probably from the larger files. Rather than using a realistic filesize mix, I used a set of 500 kilobyte files, and varied the number of files to approximate the desired total fileset size.
thttpd version 1.95 and earlier rely on alarm() to provide a timeout
on the reading of the http headers. When the SIGALRM comes in, it interrupts
the read() call. This works fine, except when it doesn't :-) due to an OS bug or
misconfiguration. To see if your installation of thttpd has this problem,
telnet to your server on the port where thttpd is running, and just sit
there; it should time out after 5 seconds, and it should do this every
time you try. Possible bugs include timeout not working at all, or
working correctly only the first time; when it doesn't work, thttpd freezes until
you quit telnet, and won't serve any documents.
On Red Hat Linux 4.2 (kernel 2.0.30), which has a bug in signal processing,
thttpd will never time out a stuck connection. You should not
use thttpd 1.95 or earlier with Linux older than 2.0.32 for anything but casual testing,
or you'll find it getting stuck periodically. (thttpd 2.0 should solve this problem.)
If you have this problem, you can test your Unix's signal behavior with this test code
to help figure out what's wrong.
I did not make sure the clients were adequately equipped with RAM; however, they did provide a heavy enough load to saturate the 10baseT LAN. A real benchmark would have required 100baseT and more client systems with more RAM.
Banga and Druschel claim that benchmarks like Webstone leave much to be desired. They're right, but I'm using Webstone anyway, since it was all I had handy.
Webstone had to be patched to run under Linux. Also, it often complained that the master had received a SIGINT when it really hadn't; this problem is mentioned in the Webstone mailing list archives. This happened often at first, but went away, perhaps when I added more RAM to the systems running Webstone.
thttpd uses a single process to handle all requests. Red Hat Linux 5.0 is based on Linux 2.0.32, which has a per-process limit of 256 file descriptors. This puts an upper limit on the number of simultaneous clients that thttpd can serve. A patch is available to increase this limit to 1024, at the cost of increasing the per-process memory consumption by about 50 kilobytes. I have not yet tried this patch, nor have I tried going beyond 242 clients yet. In my application, I could work around it by running several copies of thttpd.
Linux 2.0.32 also limits the sum of open file descriptors in all processes to 1024; this is raised to 2048 by the above patch.
Several values in /etc/httpd/conf/httpd.conf should be set before running a benchmark. The ones I've noticed so far are MaxClients, StartServers, and MaxSpareServers, which should all be set to the expected number of clients or as high as system RAM allows, whichever is less. (Although see the Apache performance notes, which mention that newer versions of Apache require less fiddling to get good benchmark behavior.)
I initially used a cheapo clone Intel Ethernet Express card on one of the systems, but after about ten minutes of heavy load, the Red Hat 5.0 driver for the card always printed an error message and refused to deliver any more packets. Installing a 3com 3c905 resolved the problem.
Webstone has a test parameter that sets the length in minutes of the benchmark run, but it does not interrupt clients until they have finished downloading the current file. The error shows up in the server and client throughput measurements, which count the bytes that are transferred after the end of the test period, causing the throughput measurements to be noticably higher than the true values. With a runtime of 1 minute, and maximum download times of 20 seconds, the reported bandwidth was greater than the physical bandwidth of the network interface. Run time should be set to something like 100 times the maximum file download time if you want to trust the numbers Webstone prints out. I did not need those numbers, so I didn't worry about it much.
Both apache and thttpd seemed to reach about the same throughput even when serving up to 32 megabytes of files. At this load, Apache was doing much more disk i/o than thttpd, but both came close to saturating the 10baseT.
During the 32 megabyte fileset test, I used a fourth computer to fetch a small html file from the web server. thttpd served up the file in five seconds; Apache took over a minute.
The above results for Apache were with the default setting of max server processes = 150 and initial server processes = 20. When Apache was tuned to start 150 or more server processes, the whole system became unresponsive as soon as the benchmark was started. When Apache was tuned to start 100 server processes, it ran sluggishly, but at least it ran.
After Apache was tuned to allow 250 server processes, it responded just as quickly as thttpd when a fourth computer was used to fetch a small html file from the web server.
With 64MB of RAM, thttpd did zero disk i/o after a couple minutes, so 64MB is more than enough for thttpd in this situation. I think this means that if the web server had 64MB of RAM and a 100Mbps Ethernet card, thttpd would be able to use up much more of the available bandwidth than Apache.