Google’s Need for Speed – Use Cache in .htaccess to Speed Up Your Site
This is the fourth in a series of about seven posts (I keep finding more info and may add a couple more posts) on how to speed up your web pages and your website in general. Speeding up sites has been in the news recently because Google has announced that page speed is now a factor in how well that page will rank for keywords. Even though Google has made it clear that page speed isn’t going to move your page from position 38 to position 4 in search results, it is still worth while to consider page speed because of the benefit it gives your readers. It’s simple…people like websites and web pages that load quickly more than those that load slowly. (Yes…people have done studies on this!)
In this series we have been talking about/will talk about the following topics:
- An Overview of Site Speed
- How to find out if your web pages are too slow
- How to optimize photos for faster loading
- How to use caching and htaccess to speed up your site (This post)
- How to compress your site
- How to speed up Google Analytics using asynchronous tracking codes
In this post we’re going to talk about how to use browser caching and htaccess to speed up your site. Caching and htaccess go hand-in-hand because it’s htaccess that best controls the caching. Let’s take a look at what each are and then get into speeding up your site:
What’s Browser Caching?
In general a cache is a hidden storage space. On the web, browsers have caches (hidden storage areas on the hard disk) that hold files so they can be reused as often as needed. An example is logos: most websites have some type of logo in the header, which means it shows up exactly the same on every page of the site. If the website is made correctly and your browser is set correctly, the first time you come to the page it puts a copy of the logo in the cache for later use. As you click around the website, the browser finds that the logo is supposed to be displayed on the top of each page. Instead of asking the website’s server for a copy of the logo for each page you go to, the browser instead just grabs a copy of the logo out of the cache on the hard disk.
As you’ve probably guessed, there are two reasons browsers use caches:
- Speed Up Web Pages
It is much, much faster for the browser to access the files.
- Reduce Network Bandwidth Use
Downloading one copy of the website logo instead of one for each of the 10 pages you go to saves a lot of bandwidth.
What Files Should Be Cached and For How Long?
In my example above, I made it pretty clear that it’s a good idea to cache images, especially those that are used repeatedly on the site. However, there are many types of files that benefit the site if they are cached properly. Some files change rarely and other change more often, so it’s important to cache them for different lengths of time. However, for the cache to be useful, it must be at least two hours. Here are some common suggestions:
- 4 Hours: HTML files
- 2 Days: XML and TXT files
- 1 Year: ICO, PDF, FLV, JPG, JPEG, PNG, GIF, JS, CSS, and SWF files
This post assumes your website is hosted on a server that is using Apache to serve webpages. If you aren’t using Apache, much of this post will not apply to you because you can’t use htaccess.
So, what is htaccess? It’s an invisible configuration file that is placed in directories of the website. You put commands in the file and Apache then carries out those commands as it is sending out the webpages. In this post we’re going to be learning commands that control how the Apache server uses HTTP caching to speed up your site and reduce how much bandwidth your site uses.
How to Get to the .htaccess File
As I said above, the htaccess file is an invisible file. This means that unless you know what you’re doing, you can’t even see this file. Most people will be accessing the file via FTP. Most good FTP clients will have an option on them somewhere (often in a “View” pulldown menu or in the applications settings) that says something like “View invisible files” or “Hide/Unhide Invisibles.” You’ll need to find this option and set your FTP client so you can see invisible files. When you arrive at the root directory for your website, you will possibly find a file at the top of the file list named, “.htaccess”. This is the file you want to edit.
If you don’t find one, you’ll need to make one. Using your FTP client, create a new text file and name it, “.htaccess”. NOTE: There is no file extension and the name starts with a dot. Now you can edit that file and make the magic that’s in the rest of this post.
Please note a couple of things:
- Not all Apache servers are configured the same by web hosting companies. Hosting companies, for reasons they often can’t explain, don’t have all the features needed for this post turned on. If your hosting company doesn’t have them turned on, I encourage you to call and ask that they be turned on. You are standing on firm ground asking them to turn them on because everything in this post is now needed for a properly functioning website.
- Playing with your htaccess file can cause problems. Generally all that will happen is you’ll save your changes, then go refresh your website and find that you’re getting an error. All you need to do is remove the changes you made to the .htaccess file and re-save it and the site will come back. Double check to make sure you’ve put the code in correctly. If you have, this means that your hosting company probably does not support the Apache function you’re trying to use.
How to Use Caching to Speed Up Your Site
1. Do NOT Use HTML Meta Tags To Control Caching
Long, long ago in a web far, far away there was a technique that used an an HTML meta tag that looked something like this:
<META HTTP-EQUIV="EXPIRES" CONTENT="Fri, 22 Jul 2015 00:00:01 GMT">
This is trying to tell the browser that it should not need to get new content about this page until the year 2022, and so it should cache everything. There are many problems with this, the biggest being that most browsers don’t pay any attention to this tag any longer. And, if they happened to read them, you have no control over what types of files should be cached.
2. Use HTTP Future Expire Headers and Cache-Control
There are two options to control caching of your site, the Expires header and the Cache-Control header. What’s the difference? PeterBe explains it quite bluntly here, but in my words, the Expires header works with older HTTP 1.0 browsers; Cache-Control is the more modern system and works with HTTP 1.1 compliant browsers.
So, should you use Expires or Cache-Control? The answer is, use both of them. Older browsers will ignore Cache-Control and use Expires. Modern browsers will override Expires and use Cache-Control.
Setting Up Expire HTTP Headers
Setting Expire headers is quite simple. You open up the .htaccess file in the root directory of your website and paste in the text below:
# Set Expires Headers <FilesMatch "\.(ico|pdf|flv|jpg|jpeg|png|gif|js|css|swf)$"> Header set Expires "Thu, 15 Jan 2015 20:00:00 GMT" </FilesMatch>
This is telling the browser to try to use cached copies of all ico, pdf, flv, jpg, jpeg, png, gif, js, css, and swf files until 2015. The expiration date must be a fixed date, so if you don’t expect these files to change you’ll want to set it far into the future or you’ll need to put a reminder in your calendar to update your Expires header before it expires.
Setting Up Cache-Control Headers
The cache-control headers give you a lot more power. They allow you to set when you want particular types of files to expire by changing the number of seconds since the file was first cached until it expires.
# Set the cache-control max-age # 1 year <FilesMatch "\.(ico|pdf|flv|jpg|jpeg|png|gif|js|css|swf)$"> Header set Cache-Control "max-age=31449600, public" </FilesMatch> # 2 DAYS <FilesMatch "\.(xml|txt)$"> Header set Cache-Control "max-age=172800, public, must-revalidate" </FilesMatch> # 4 HOURS <FilesMatch "\.(html|htm)$"> Header set Cache-Control "max-age=14400, must-revalidate" </FilesMatch>
3. Remove ETags and Last Modified
Above we’ve done everything needed to speed up your website by using caching. However, there are a couple of other options to accomplish caching: ETags and Last Modified which basically accomplish the same thing but don’t need to be used. As AskApache said in his fantastic posts on caching:
If you remove the
ETagheader, you will totally eliminate
If-None-Matchrequests and their
304 Not ModifiedResponses, so a file will stay cached without checking for updates until the Expires header indicates new content is available!
Because we’re now using Cache-Control and Expires, there is no need for these other two, except for HTML documents, where one of them should be left on.
Another reason to use Cache-Control and Expires instead of ETags and Last Modified is because more and more hosting companies are using cluster (cloud) servers. ETags and Last Modified only work if the file that is being tagged comes from the exact same server. If your host is using a cluster of servers, the ETags and Last Modified will not work, but Cache-Control and Expires will.
How to Remove ETags and Last Modified
Simply copy and paste the below text into your .htaccess file, save it, and then check to make sure it didn’t break your site (it could depending on the Apache setup). If it breaks anything, simply undo the changes and save the old version again.
# Turn off the ETags Header unset ETag FileETag None # Turn off the Last Modified header except for html docs <FilesMatch "\.(ico|pdf|flv|jpg|jpeg|png|gif|js|css)$"> Header unset Last-Modified </FilesMatch>
Use .htaccess to Speed Up WordPress
Since you’re in .htaccess already, you might want to check out a post I wrote a while back about how to speed up the WordPress rewrite engine for optimized URLs. If you’re using WordPress and are using optimized (readable) URLs, this little change to your .htaccess file can speed up your site considerably.
The next post in this series will be much shorter…how to set up .htaccess so text-based files will be compressed on the web server so it will