Recently I systematically optimized this little site. By way of documentation and in case it is of public interest, here's what I did...
HTML is a "loose" language. Just about anything goes. The popular browsers like Internet Explorer and Mozilla will "do the right thing" with all kinds of weird errors. But for maximum compatibility it is best to have pages which are "correct". The easiest way to make sure your pages are correct is to use an HTML validator. I like Doctor HTML, but there are a bunch out there. You point Doctor HTML at a page, and it tells you what (if anything) is wrong with it. This is a great way to pick up unclosed tags, invalid syntax, etc. - it also verifies links and even checks spelling. Most browsers and programs don't care about content-encoding, but some do. (The ones that don't pretty much assume U.S. ASCII is in use.) The easiest way to take care of this is simply specify the encoding in a META tag: <META HTTP-EQUIV="Content-type" CONTENT="text/HTML; charset=US-ASCII"> If you have templates for your pages, put this in the template and you're done. Finally, if you're a heavy user of CSS, be sure to test the CSS you're using on all the browsers with which you want to be compatible. I test with Internet Explorer, Mozilla, and Opera (Windows), Internet Explorer, Mozilla, and Safari (Mac), and Mozilla (Linux). Even though your CSS may be "valid", it may not be interpreted the way you want by all browsers. This is one reason I've stuck to frames and tables, they've been around so long pretty much all browsers treat them the same way.
Everyone's browsing experience will improve if you can reduce file sizes, especially people with slower connections to the Internet. It will also enable your site to serve more people concurrently with the same amount of bandwidth. There is nothing you can do which is better for your visitors (except give them interesting content!) Reducing file sizes bifurcates into two kinds of activities: reducing image sizes, and reducing page sizes.
Image sizes are a function of three things - the pixel dimensions of the image, the type of image, and the compression ratio. You should never make images any bigger than they have to be. If you have a really big image which just must be big, then put a thumbnail for it in the page's content which links to a new window with the big image. Any image bigger than 200 x 200 pixels is a candidate for shrinkage or thumbnailing. There are two kinds of images in wide use on the web: GIFs and JPEGs. GIFs are best for images with a small number of colors and well-defined borders - cartoons, diagrams, flow charts, etc. JPEGs are best for images with gradients of colors and smooth transitions - mainly photographs. The coolest tool for shrinking images is Adobe Photoshop's "Save for the Web" feature. This allows you to take any image and try "what if" scenarios with file format and compression ratio. In addition, when Photoshop saves for the web it optimizes image headers, storing only the minimum information required, and enables progressive rendering, allowing larger images to be displayed incrementally as the browser receives data. There are other tools which have similar capabilities, but Photoshop is the leader.
HTML pages are plain text; making them smaller is pretty tough. Of course it is always better to use less words if you can, "brevity is the soul of wit" and all that. But that won't really make your pages smaller. The best thing to do for reducing HTML page sizes is to implement GZIP compression. This means each page will be compressed before sending it out over the network, and decompressed by the browser. Typically this reduces file sizes by about 50%. All modern browsers say they support compression and do, but many robots do not. If the client does not support compression the server will automatically send an uncompressed page. There is really no downside to implementing this - do it! If you're using Apache, the way to implement compression is via mod_gzip. There are many parameters for mod_gzip; I found this page to be very helpful. I use the following directives in my HTTPD.CONF file:
If you're using IIS, the way to implement compression is via the Web Service property sheet. Microsoft has a good description of how to do this on their website. They are cautious about recommending page compression for CPU utilization reasons, but in my experience it is always beneficial; most of the time your webserver runs out of bandwidth long before it runs out of CPU cycles. This page also has good information about configuring IIS for compression. After you get compression configured, you can test it using this site. Very handy.
I don't know about you, but I've found that "robots" make up a good deal of the traffic to my site. These robots can be search engine spiders, various indexing tools like technorati, or analysis tools. There are also tons of RSS aggregators out there, and although they load your site's RSS feed first, many of them come back and get page data, too. So - I have my website setup to look for the HTTP_USER_AGENT, and if the client is a robot I serve a different home page. This serves several purposes:
How do you tell if you're dealing with a robot? Well, if the agent string doesn't start with "Mozilla" or "Opera", it's a robot. (For historical reasons all versions of Netscape and Internet Explorer have always used "Mozilla" in their user agent strings.) If it starts with "Mozilla" it might still be a robot pretending to be a browser; I check for two common cases, "Slurp" (Inktomi's spider) and "Teoma" (Ask Jeeves / Teoma's spider). There are others, but this will get you 99% of the robots.
It was a little more work, but it's nice to keep the robots happy :) [Update 1/1/23: no more special page for Robots. This was a lot of work, but ultimately not needed anymore, if it ever was.] |
|