Critical Section

Archive: March 12, 2003

<<< March 11, 2003

Home

March 13, 2003 >>>


Slashscreen

Wednesday,  03/12/03  11:46 PM

You work hard, day and night, off and on, posting cool stuff to your site, enjoying it for itself, taking pleasure in it, and sharing it with a few friends.  Then one day out of the blue, something you post catches a wave in the blogosphere, and before you know it, those few friends have turned into thousands of visitors.  Yippee!  ... And GAACK!  You've been - dum, de dum dum - slashdotted.  Now what?

The Basic Problem

Okay, so thousands of people come.  So what?  Why doesn't everything "just work"?  Well, there are a few possible problems, and one very likely problem:

The possible problems:

  • Your OS can't handle all the traffic.
  • Your CPU is maxed out.
  • Your disk is maxed out.
  • Your logs are full.

The very likely problem:

  • You ran out of bandwidth.

We're going to talk about each of these things, and then propose solutions.  But first, how do you know you have a problem?  Well, because you can't get to the site.  Pretty simple.  If you can get to the site, even if it is slow, you're fine.  Chances are everyone else can get in, too, even if they find it to be slow, also.  That's the best you can hope for, and if that's all that happens you're in fine shape.  You don't really have the problem - feel free to skip the solution, and jump down to optimization.

Say you can't get to the site.  You try, and nada - timeout - nothing.  So what's wrong?  The first thing you have to do is figure out how to talk to your server.  If you're physically near it, great, you have a console, you're in luck.  And if you have a local network connection to it, chances are you'll be able to get in and poke around.  If all you have is a remote connection you're going to be sad, because the remote connection is flooded (see "the very likely problem" above), and you have your basic catch-22. 

{
When I was recently slashdotted, I was actually out of town (of course!), and talked my wife through setting up another router to access my server.  You might want to think about how you would solve this problem if it ever happened...  even a dial-up works great as a back door when the front door is flooded.
}

Okay, now you're talking to your server.  How do you figure out what the problem is?  Let's take the possible problems in order:

Your OS can't handle all the traffic.  If you're running Windows, this is a possibility if you have a fast network connection.  Most of the time you'll run out of bandwidth before Windows networking gets flooded, but if you have a T1 or faster this could be your problem.  You'll know because your OS is dead.  As in - blue screen - as in - reboot...

Your CPU is maxed out.  As with the OS, this will mostly happen if you're running Windows, and will only happen if you have a fast network connection.  Critical Section has the world's lowliest CPU (a five-year-old Pentium II) and it has a lot of Korn Shell CGIs (one for just about every page view), and it was nowhere near maxed out when I was slashdotted.  On Linux you can monitor CPU consumption with top, on Windows use Task Manager.

Your disk is maxed out.  Disk I/O used to be the rate-limiting factor for web servers, but disks have gotten much faster and network connections have not.  It is possible this will become a bottleneck if you have a fast network connection.  You can tell on Linux by using top, and on Windows in Task Manager.  If your machine "feels" really slow but you're not CPU-bound, this could be the problem.

Your logs are full.  Poor you.  This won't keep you from serving visitors, it will only keep you from knowing they came.  You can tell because, well, your disk(s) will be full and you will have a really huge log file somewhere.  This problem will occur in conjunction with one of the other problems; basically it is what it is, but it is not fatal.  You do want to do something about it (see below) because filling up your disks can cause other problems.

Slashdotted!
Slashdot bandwidth
You ran out of bandwidth
.  This is by far the most likely problem.  You have tens or even hundreds of concurrent requests, and they are all sharing one little pipe, and none of them will be successful.  You can tell by running netstat under Linux, or using Task Manager under Windows XP (not sure what you do on Win2K... anyone?).  You can also tell by looking at the lights on your router (flashing wildly) or [if you're a geek] by viewing your router statistics (see picture).

If you don't know what your problem is - and can't figure it out - just assume you are out of bandwidth.

The Solutions

Okay, you have a problem, and you might know what it is.  What do you do about it?  Well, the first thing is to bring the site down.  Really really.  Don't worry, nobody was getting any response anyway.  Once the site is down, the pressure is off, and you can calmly reconfigure things to handle the onslaught.  Don't worry, everyone won't stay away, they'll be back...

Next - and this is really important - keep a written log of all the changes you make.  Many of your changes are short-term temporary things to get you through the next hours and days while you are basking in the glow of popularity.  In a week things will be calm again, and you'll want to go back to the way things were.  Also, in the heat of the moment you may try things which don't work at all, and you want to be able to back them out.  If there is one thing I can recommend for ANY computer problem solving situation, it is - keep a written log of all the changes you make.  Trust me, you will not remember everything you did.

Okay, so here are the things to try.  I'm suggesting solutions for Windows/IIS and Linux/Apache, because those are by far the two most common OS/webserver configurations.  (Also because those are the only two I know anything about :)

Your OS can't handle the traffic.  The main problem here is too many concurrent connections.  You want to limit the number of connections your website will accept. 

  • Under Windows/IIS the best way to throttle traffic is to use bandwidth throttling.  From your Computer Management console, select Internet Information Services, then right-click on your website and select Properties.  Now click on the Performance tab.  Check the box labeled "Bandwidth Throttling" and enter the maximum KB/s you want to allow.  Here's a good rule of thumb - web pages are around 20KB in size.  So if you set this to 20 it means you'll be serving about one page per second.  Setting this to 40 will serve about two pages per second.  For a typical PC-based website, that's pretty fast.  Remember you are throttling, deliberately holding back traffic so at least some of your visitors will get served.  Be conservative.

Your CPU is maxed out.  The problem here is that you probably have lots of ASP or CGI pages, perhaps with a database interface.  You need to hold back some CPU cycles so the machine stays alive. 

  • Under Windows/IIS use process throttling.  From your Computer Management console, select Internet Information Services, then right-click on your website and select Properties.  Now click on the Performance tab.  Check the box labeled "Process Throttling" and enter the maximum  percent CPU you will allow IIS to use.  I suggest 75%, that seems like a good compromise.  Be sure to check the Enforce Limits box as well.
  • Under Linux/Apache use mod_throttle.  Edit your httpd.conf file and add the following lines:

    <IfModule mod_throttle.c>
         ThrottlePolicy request 2 1s
    </IfModule>

    This enables mod_throttle in "request" mode, which limits the number of requests per time period.  The example above shows 2 requests per 1 second, which is pretty fast for a typical PC-based website.  You could make this 1 request per 1s if you want to be conservative - that is still a lot of traffic...

{
A digression - what if you don't have mod_throttle?  This means you are not running a "current" version of Apache, and that's a bad thing (the patches generally plug security holes which you want to have plugged).  But the heat of battle is not the time to discover you need to upgrade Apache.  So if you're reading this at a calm moment, check to see if there are updates available, and check to make sure the mod_throttle RPM is installed.
}

Okay, on to the next problem...

Your disk is maxed out.  The problem here is that you are serving a lot of static pages, probably with lots of images, and/or you have a database interface as part of your page generation.  You want to limit the number of concurrent requests so some of your visitors get served.

  • Under Windows/IIS, restrict requests with bandwidth throttling, see "your OS can't handle the traffic" above.
  • Under Linux/Apache, restrict requests with mod_throttle, see "your CPU is maxed out" above.

Your logs are full.  There are two things to do here.  First, you could copy off the old log(s), and free up the disk space.  Second, you could turn off logging.  Turning off logging also reduces the resource requirements of serving each page; in many cases the overhead of making a log entry for each hit is far greater than the overhead of serving a page.  This is especially true if you have DNS resolution enabled (resolving visitors IP addresses).

  • Under Windows/IIS, turn off logging as follows:  From your Computer Management console, select Internet Information Services, then right-click on your website and select Properties.  Now click on the Web Site tab.  Uncheck the Enable Logging checkbox.
  • Under Linux/Apache, turn off logging as follows:  Edit httpd.conf and find each CustomLog directive.  Comment out the directive and replace with a /dev/null path, like this:

# CustomLog /var/log/httpd/access_log common  # old one
CustomLog /dev/null common                    # new one

If you're like me, vanity will compel you to try to keep logging enabled.  Go ahead and try, but remember that the most important thing is to keep serving your visitors. 

You ran out of bandwidth.  This is the most likely problem, so if you don't know what to do, try this first.  Basically there are too many requests coming through at the same time, and nobody is going to get a response.  Your onramp to the information highway is gridlocked.  You have to restrict the number of requests so at least some of your visitors will get served.

  • Under Windows/IIS, restrict requests with bandwidth throttling, see "your OS can't handle the traffic" above.  Remember to be conservative - just because you have a 384K link doesn't mean you can serve 384KB/s.  Half of that would be more likely - 50% network utilization is a good target.
  • Under Linux/Apache, restrict requests with mod_throttle.  There are actually three different ways you can use mod_throttle, each is described below (you have to pick one).  Essentially you edit httpd.conf to add the following directives:

<IfModule mod_throttle.c>
     ThrottlePolicy request 2 1s       # one way
     ThrottlePolicy idle 500K 1s       # another way
     ThrottlePolicy volume 200K 1s     # a third way

</IfModule>

Each of these "ThrottlePolicy" directives accomplishes restricting requests, but they each do it differently.

  • The "request" directive restricts the number of requests accepted in a particular time period, as mentioned above. 
  • The "idle" directive forces a specified amount of idle time between requests.  This is helpful is your machine is doing other things besides web serving, or if you want to "pace" the traffic on your link (e.g. you want to be able to get in yourself!) 
  • The "volume" request restricts the amount of bandwidth used in a particular amount of time.  Again, you should pick a value like 50% of the rated bandwidth of your link. 

So how do you pick?  I'd say if you have a slow connection, use the "idle" method.  I have an ISDN line (128K) and this worked well for me.  If you have a faster connection, the "volume" method is probably best.  Just make sure you don't set it too high...

So you make the configuration changes - and bring your site back up.  Yippee - you're alive!  Now you can bask in the glow of Internet affection.  Nothing quite like a "tail -f access_log" when the hits are coming faster than you can read them...  You might have to tune things more than once.  Human nature being what it is, you probably didn't throttle enough on your first set of changes.  This means requests out there are timing out, and that means wasted bandwidth.  Monitor things closely and don't be afraid to bring your site back down for a bit to reconfigure and throttle back further.

Well, that was good.  You reconfigured your server, brought the website back up, and now you're alive.  You can handle any amount of inbound traffic without dying if you throttle as described above.  Of course, many of your visitors will get nothing, but some will get served.  How can you maximize the number who aren't disappointed?  Read on...

Optimization

Once you're back up and running, naturally you'll want to tweak as much as possible to serve as many users as you can.  Here are some tips:

  1. Keep a written log of all the changes you make.  I know, I said this already.  But please do this - some of the things you try won't work, and other things you do will work, but will be of a temporary nature.  You'll want to have the ability to back things out later.
  2. Reduce any unnecessary overhead on the server.  If the machine is doing anything else - turn it off.  Under Windows use Task Manager to find and kill any stuff going on which is not essential.  On Linux use ps and kill to eliminate unnecessary processes.  You might also check crontab to make sure you don't have some scheduled things happening which are unnecessary.
  3. Figure out what to optimize.  Most likely you have posted a particular article, and everyone is trying to read it.  Your mission is to make that page fast, you don't have to optimize the whole site.  My site uses a frameset (I know, I know - but I like frames...).  When Tyranny of Email was slashdotted, I reconfigured that article to display as a single page, outside the frameset.
  4. Optimize the number of hits.  Any files referenced from your pages will cause more hits - CSS files, JavaScript files, images, etc.  This is the single best thing you can do, eliminate extra hits.  Here are some ideas:
    • Put CSS and JavaScript inline.  Normally it is a great idea to put "common" CSS and JavaScript in separate files.  But when you're under attack, you need to reduce hits.  Inline the CSS and the JavaScript for the page(s) which are being whacked.
    • Eliminate unnecessary images.  Any images which are decorations like logos, corners, bullets, etc. can be eliminated.  Just do this temporarily on the pages under attack.  Utility before beauty.  Some sites have little invisible images as spacers - kill them (yeah, the spacing is off a little, so what...)  Some sites use invisible images as beacons for hit counting and tracking - kill them.
    • Eliminate necessary images.  Yeah, really.  Tyranny of Email contains a cartoon image; it breaks up the page and is relevant, but not essential.  I temporarily took it out.
  5. Optimize the size of your pages.  As we talked about already, your most likely bottleneck is bandwidth.  If you can make your pages smaller, you'll be able to serve more users.  Here are some ideas - remember, you can do stuff temporarily and restore later when the wave has passed:
    • Remove unnecessary CSS.  Sometimes you have a site where every page has the same styles defined in it, generated by a template.  Or maybe you have CSS in a common file, but you took the advice to inline the CSS.  For the specific page being whacked, get rid of any styles you don't need.  If you want, get rid of the styles you do need, too, or simplify them.  It won't do you any good to have pretty pages if nobody can read them.
    • Remove unnecessary JavaScript.  Any "onload" animations and stuff like that can be temporarily eliminated.  If you have moved common JavaScript inline, eliminate any functions not used on this page.  Actually you might be able to eliminate functions which are used - mouseovers and stuff like that which are not essential.  Survive first, prosper later!
  6. Make pages static where possible.  This can be a temporary change.  On Critical Section virtually every page is generated by a CGI.  When I was flooded, I made the popular pages flat HTML.  Eliminate CGI and ASP - even if you have enough CPU cycles to run them, the responses are slowed and thus resources are consumed longer which could otherwise be available for serving another user.

Those are some ideas, you may think of others.  Essentially you want to do everything you can to keep things moving while the world's population hammers your site.  Later after the dust settles you can revert back to normal.  If you want to.  But do take the time to...

Learn From the Experience

So it happened - you were popular for one day, you throttled your site, you optimized your pages, and you survived.  Congratulations and yippee.  Take the time to learn from the experience.  Maybe some of the things you did should be permanent changes.  Most likely you'll want to back out the throttling changes to the server.  But the site optimization changes might be good ones.  Do you have to have all those images?  Is every CSS style and JavaScript function necessary?  (Do you have to have frames? :)

So - that's it for my slashdot sunscreen.  Good luck - may you have the good fortune to need it! 

I'd be interested in your comments and suggestions, please shoot me email.

[ Later - I wrote up some more thoughts on Site Optimization... ]

 
 

Return to the archive.

Home
Archive
flight
About Me
W=UH
Email
RSS   OPML

Greatest Hits
Correlation vs. Causality
The Tyranny of Email
Unnatural Selection
Lying
Aperio's Mission = Automating Pathology
On Blame
Try, or Try Not
Books and Wine
Emergent Properties
God and Beauty
Moving Mount Fuji The Nest Rock 'n Roll
IQ and Populations
Are You a Bright?
Adding Value
Confidence
The Joy of Craftsmanship
The Emperor's New Code
Toy Story
The Return of the King
Religion vs IQ
In the Wet
the big day
solving bongard problems
visiting Titan
unintelligent design
the nuclear option
estimating in meatspace
second gear
On the Persistence of Bad Design...
Texas chili cookoff
almost famous design and stochastic debugging
may I take your order?
universal healthcare
entertainment
triple double
New Yorker covers
Death Rider! (da da dum)
how did I get here (Mt.Whitney)?
the Law of Significance
Holiday Inn
Daniel Jacoby's photographs
the first bird
Gödel Escher Bach: Birthday Cantatatata
Father's Day (in pictures)
your cat for my car
Jobsnotes of note
world population map
no joy in Baker
vote smart
exact nonsense
introducing eyesFinder
resolved
to space
notebooks
where are the desktop apps?