In a recent post, I presented a few easy steps that anyone can take to speedup their WordPress blog. My aim in that post was to introduce a basic methodology for measuring performance (YSlow), along with a fairly simple recipe for improving a web site’s YSlow score, and thereby its performance. Much to the horror of my sysadmin friends, I encouraged people to blindly follow the instructions as given — though in my defense I did also strongly encourage readers to peruse the related articles for more information!
In the comments, reader zuborg asks:
“Very funny to observe that wordpress blog posts recommend to remove ETags from http-headers. Did somebody investigate what is and for what purpose it was designed?”
This is a very good question, and one I would like to respond to in more detail here.
The Caching Tutorial for Web Authors and Webmasters article provides an excellent introduction to caching, including a good section on cache validation and ETags. I consider the article a must read for anyone interested in how Internet caching works. I’ll quote from that article here:
“HTTP 1.1 introduced a new kind of validator called the ETag. ETags are unique identifiers that are generated by the server and changed every time the representation does. Because the server controls how the ETag is generated, caches can be surer that if the ETag matches when they make a If-None-Match request, the representation really is the same.”
“Almost all caches use Last-Modified times in determining if an representation is fresh; ETag validation is also becoming prevalent.”
So ETags provide a unique identifier that can work in conjunction with, or in lieu of, the Last-Modified header to reduce the amount of data traffic associated with sending files from the server to the web browser. This certainly sounds like a useful feature to improve overall performance.
In 2006, Yahoo! published some performance research in which they state “Our experience shows that reducing the number of HTTP requests has the biggest impact on reducing response time and is often the easiest performance improvement to make.” One outcome of that research was a set of recommendations for reducing the number of HTTP requests, and one of those recommendations was to configure your ETags or simply remove them entirely. If ETags help with caching, then why would Yahoo! suggest removing them?
Let’s see what Yahoo! has to say:
“The ETag format for Apache 1.3 and 2.x is inode-size-timestamp. Although a given file may reside in the same directory across multiple servers, and have the same file size, permissions, timestamp, etc., its inode is different from one server to the next.”
“The end result is ETags generated by Apache and IIS for the exact same component won’t match from one server to another. If the ETags don’t match, the user doesn’t receive the small, fast 304 response that ETags were designed for; …”
“Even if your components have a far future Expires header, a conditional GET request is still made whenever the user hits Reload or Refresh.”
So there you have it. Yahoo! recommends removing the ETags because conditional GET requests are made whenever the user hits Reload or Refresh — and we’re all about eliminating superfluous requests whenever possible.
There is some interesting discussion in the comments associated with the Yahoo! blog post about removing ETags, with many people extolling the virtues of ETags and suggesting ways to retain them. Some people have recommended removing the inode component of the ETag, resulting in an ETag that includes the file size and timestamp only. For Apache, you can use the FileETag directive as follows:
FileETag MTime SizeCode language: HTML, XML (xml)
In the Yahoo! blog comments, Steve Souders (now with Google) makes the point that an ETag composed of the modification time (MTime) and size is providing essentially the same information as the Last-Modified header, and questions whether ETag validation is superior to Last-Modified validation in any case.
Ultimately, configuring or removing ETags is up to you. Since our strategy is to provide a far futures cache expiration date, I am perfectly comfortable removing the ETags. Doing so makes it easy to use the YSlow grade as a quick evaluation tool, but as @zuborg reminds us, actual performance is something you ought to evaluate for yourself.