PHP Performance Series: Caching Techniques

by Mike Willbanks on February 27th, 2008

Welcome to the first edition of the PHP performance series, a new series that I will be explaining ways to gain efficiencies and squeezing more performance out of your applications. This first edition, caching techniques, focuses on ways to cache data to optimize your current sites. Some of the concepts here are fairly easy to implement while others may take strategic design in the architecture of your application. Whether you are working on a high profile web application or simply a web development farm these concepts apply to the masses.

Opcode Caching

Opcode caching is likely one of the most simple and effective ways of increasing performance in PHP. By utilizing an Opcode cache you will eliminate many unneeded inefficiencies that happen during the execution process. Opcode caches solve this by storing the opcodes in memory in order to not compile files on each step in the process.

There are many opcode caches available for consumption. You have APC, XCache, eAccelerator and Zend Platform. You make your choice up of what you like the best as they all have advantages and disadvantages which is out of the scope of this article.

File Priming

This is typically more relevant to larger scale companies that have release processes. When you are pushing out a new release, typically you do not want to have your caching system waiting until each page is hit until it is processed in the opcode cache. Instead what can be done, is to run a utility script after the release is pushed out to run each file through the opcode caching extensions compile function. There is an example of this on my performance overview post which has a section about file priming for APC Each of the different opcode caches typically have a way to prime the files, so just look into the API documents.

Caching Variables

Many opcode caches also allow for you to place variable data, also known as user land data, into the cache (typically in memory). This is useful for storing your configuration values or data that is expensive to get and will likely not change.

Example: APC Variables

if (($config = apc_fetch('config')) === false) {
    require('/path/to/includes/config.php');
    apc_store('config', $config);
}

A practical example of this was using the Zend Framework and simply running an ab bench after storing the results of the XML configuration file in the cache. This saved parsing time as well as extremely quick access to the configuration file.

Figure: APC Variables in Use

The Code

if (($conf = apc_fetch('pbs_config')) === false) {
    $conf = new Zend_Config_Xml(PB_PATH_CONF . '/base.xml', 'production');
    apc_store('pbs_config', $conf);
}

The Benchmark Command
ab -t30 -c5 http://www.example.com/

Results Without The APC Variable

Concurrency Level: 5
Time taken for tests: 30.33144 seconds
Complete requests: 684
Failed requests: 0
Write errors: 0

Results With The APC Variable

Concurrency Level: 5
Time taken for tests: 30.12173 seconds
Complete requests: 709
Failed requests: 0
Write errors: 0

As you could see we had approximately a 3-4% gain in performance by simply caching our configuration file. There is many other areas that could be added into these areas of memory thus increasing your overall performance. Find a few of these and you will certainly see increases in the amount of requests handled. Note that the server that is being tested on is an older box and including a mass amount of files using the Zend Framework.

Make sure to check the documentation on each opcode cache to ensure what you can store in the variable scope (some will not support automatically serializing objects so be careful). Further, ensure you have enough memory allocated in order to do this in specific areas. Lastly I did not include the other op code cache examples here; I simply wanted to give an example to show what common usage would be like.

File Caching

Many times there are areas where the server is processing the same page of content that has not changed. There are always opportunities to cache this type of content, whether in part or in full. I’ll attempt to address both areas here from a simplistic point of view, rather than discussing techniques of generating static content that could be utilized by running a static web server.

For the sake of time and being practical with pre-existing tools, I will be showing the examples in the Pear::Cache_Lite package.

Full File Caching

Full file caching is rather hard to achieve on many different sites when we are pulling data for different reasons and sometimes from different sources. However, while that may be true, there are certainly cases where you do not need to have the “most” up to date data available at that very second. Even a 5-10 minute delay on extremely high traffic sites will award you a performance increase. It is always good to ensure that you are checking your site for these types of areas and creating an easy way to allow for future modification.

While you always have to come at caching with different angles, this is quite possibly the quickest way to add it in and is certainly not flawless. The following example simply takes a snapshot of the page and stores it for use again. This is not a complete logical approach but may be good for certain users.

I do not recommend this for a long term solution but if you need something that is short term and this meets your needs, implement if you like but sooner or later you will see the drawbacks to this method. Such as no content is ever dynamic or certain pieces of content need to be updated sooner than others.

The Bootstrap Cache Example:

require('/path/to/pear/Cache/Lite/Output.php');
 
$options = array(
	'cacheDir' => '/tmp/',
	'lifeTime' => 10
);
 
$cache = new Cache_Lite_Output($options);
 
if (!($cache->start($_SERVER['REQUEST_URI']))) {
	require('/path/to/bootstrap.php');
	$cache->end();
}

The .htaccess Cache Example:

.htaccess
php_value auto_prepend_file /path/to/cache_start.php
php_value auto_append_file /path/to/cache_end.php

cache_start.php

require('Cache/Lite/Output.php');
 
$options = array(
    'cacheDir' => '/tmp/',
    'lifeTime' => 10
);
 
$cache = new Cache_Lite_Output($options);
 
if (($cache->start($_SERVER['REQUEST_URI']))) {
    exit;
}

cache_end.php

$cache->end();

Cache Lite does a lot of the heavy work for you such as file locking, deciding on how to save the content through the parameter given (here we are just using the REQUEST URI). You may need to take in consideration the $_POST variables, $_COOKIE variables or even the $_SESSION variables depending on what you are attempting to achieve.

Partial File Caching

Partial file caching is typically the route that you will likely see the most benefits overall. You likely have quite a bit of content that does not need to be real-time, however, you would like it to be updated once in a while. Or secondly, you have specific portions of the site that simply do not need to be updated at all. This is where the partial caching comes in and really allows you to see quite a bit of performance gains across the board.

Caching Contents Of A String

require('Cache/Lite.php');
$options = array(
    'cacheDir' => '/tmp/',
    'lifeTime' => 3600 //1 hour
);
$cache = new Cache_Lite($options);
if (($categories = $cache->get('categories')) === false) {
    $rs = mysql_query('SELECT category_id, category_name FROM category');
    $categories = '<ul class="category">';
    while($row = mysql_fetch_assoc($rs)) {
        $categories .= '<li><a href="category.php?id=' . $row['category_id'] . '">' .
                                $row['category_name'] . '</a></li>';
    }
    $categories .= '</ul>';
    $cache->save($categories, 'categories');
}
echo $categories;

While this is a highly simplistic example, it shows the flexibility to store contents. You could even store an array instead in order to cycle through it at a later time.

Caching An Array Of Results

require('Cache/Lite.php');
$options = array(
    'cacheDir' => '/tmp/',
    'lifeTime' => 3600, //1 hour
    'automaticSerialization' => true
);
$cache = new Cache_Lite($options);
if (($categories = $cache->get('categories')) === false) {
    $rs = mysql_query('SELECT category_id, category_name FROM category');
    $categories = array();
    while($row = mysql_fetch_assoc($rs)) {
        $categories[] = $row;
    }
    $cache->save($categories, 'categories');
}
var_dump($categories);

As you can see, you can store different types of data through the cache. However, with file caching I would be reluctant to store database data as there are better solutions for that type of role which I will be talking about shortly.

Memory Caching

There are a few different ways to produce caches in memory including: memcached, database memory tables, utilizing RAM disk and another option is using the opcode caches memory caching from the beginning of this article. It is best to keep things in memory that are utilized most often and often have a small footprint.

Memcached

From the memcached website:

memcached is a high-performance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load.

Essentially what this is saying is that it is able to be stored on a central server with many servers accessing it, it is not tied into your web server such as an opcode cache as it runs it’s own daemon and it is typically utilized for caching database results (doesn’t mean there aren’t additional things it is good for such as session handling — it is already integrated if you just cache the session handler to “memcache” and change your session.save_path to the server with memcached).

Memcache Example

$post_id = (int) $_GET['post_id'];
$memcached = new Memcache;
$memcached->connect('hostname', 11211);
if (($row = $memcached->get('post_id_' . $post_id)) === false) {
    //yes this is safe, we type casted it already ;)
    $rs = mysql_query('SELECT * FROM post WHERE post_id = ' . $post_id);
    if ($rs && mysql_num_rows($rs) > 0) {
        $row = mysql_fetch_assoc($rs);
        // cache compressed for 1 hour
        $memcached->set('post_id_' . $post_id, $row, MEMCACHE_COMPRESSED, time() + 3600);
    }
}
var_dump($row);

This is a fairly typical example of memcached. We stored a single item in memory for future usage that might be accessed quite a bit. I recommend using this for records that are accessed the most, thats what a cache is all about.

Memcache Session Example

session.save_handler = memcache
session.save_path    = "tcp://hostname:11211"

As you can see session handling is quite easy. For multiple memcached servers comma separate the save_path value with each server.

Database Memory Tables

Database memory tables, while I am not going to give an example, can be useful for session data. You can easily create a table with the storage engine of memory using MySQL. Create your own session handler and provide the data that way. This is a quick way to boost performance on sessions as well as keeping them distributed between multiple web servers. Personally, if you can, I would go the memcached route to keep the load off of the database server and let it work on serving other requests.

RAM Disk

While utilizing your RAM as a disk is not distributed it can easily be a quick adjustment to make your site perform faster. However, you might want to note the amount of memory you are going to be utilizing and ensure that on reboot that this directory is put back on the RAM Disk. Remember that information placed in RAM is lost on reboot or power failure.

Bind RAM to a Directory

mount --bind -ttmpfs /path/to/site/tmp /path/to/site/tmp

I attempt to avoid this route as I believe that the risk outweighs the gains, unless you are dealing with massive servers. But there are better tools such as memcached that I would trust more.

exit(0);

I hope that this was informative to some of you regarding caching techniques in PHP. I didn’t fully cover all of the potential caching techniques such as database caching that the RDBMS’s do and some of the other items such as Squid. I may cover more of these at a later time, if I attempt to get into it all now this post will never see the light of day. If you have anything to add send in a comment. Please note, I do not deploy these tactics on everything and anything but decide on certain logistics when and where these need to be implemented. Take into consideration the scale of the project, current overall impact and if you are just optimizing it just for the sake of doing it.

From PHP

22 Comments
  1. I’m surprised you didn’t mention caching proxies. If it’s possible to do full page caching, it’s usually possible (and often a better option) to use a caching proxy such as Squid Cache.

  2. Whoa. That’s the same topic I presented at my PHP meetup this month.

    The slides from my presentation are up at http://www.slideshare.net/csixty4/caching-data-for-performance if you or any of your readers are interested.

  3. @Rob –
    Actually I mentioned squid towards the end of the post. It is something that I would rather mention further when I talk about some of the ways to increase performance on the HTTP level rather than at the PHP level. Since it is a proxy, I felt that it was slightly out of scope in this area since we are not dealing with PHP directly or using any PHP functions to control it overall. I certainly agree with you here and have used it in the past many times to speed up sites.

    @Dave –
    Interesting, I guess we are being timely on this! I actually started this a few weeks ago and just finished it up last night. I thought about adding more information such as what Rob was talking about, database level memory tables and there are a few more techniques but rather stayed away since I would never have this see the light of day.

  4. Another excellent article!

    Just be careful caching things like sessions in memcached. It’s not difficult to create sessions (and other data) that’s too large to be stored in memcached. It’s always worth using a caching solution that will fallback to caching to disk if the write to memcached failed.

  5. Yup, good suggestions. On Clicky, we use Memcached, eAccelerator, and partial file caching (although the caching it’s a home-rolled solution, as I’d rather die than install the bloated monstrosity that is PEAR).

    Proxies can work well too although they are a bitch to configure and we just had too many problems with them on such a dynamic site. Instead we added some good file caching techniques with htaccess and some apache extensions (mod_expires, mainly). This helped a bunch too.

  6. @Stuart –
    Thanks! I forgot to put in the 1MB limit by default in memcached. It certainly is easy to go over that limit.

    @Sean –
    I also typically go with homebrew solutions or use pear packages without dependencies by downloading and installing in a lib folder I typically keep of my objects.

    Proxies can be rather hard to configure at first, however, at a different company we utilized squid quite heavily on a particular site that ended up working great. Basically we set the accelerator up instead of a full proxy and let certain things pass through dependent on the conditions. It does take a while to get it up and going but certainly is a great performance gain after you get it all configured.

    There are certainly great ways to help the user side cache items such as you stated. I am actually going to address those individually as they don’t really make a difference to code as much as they make a difference to headers that are being sent to the users browser.

  7. Sorry, I missed that. About an object cache which backs on to the filesystem, there’s Sharedance. I wrote a php interface for it a while ago with redundancy built in, it’s called PHPDance.

  8. @Rob –
    The thing that makes me a little weary of Sharedance is that the package hasn’t had an update since Feb of 2006 which is 2 years ago. It would be useful as well if they provided documentation, resource levels, benchmarks and extensions into PHP. In the base PHP class it seems to waste quite a bit of time closing the connection and then reopening the connection instead of having a persistent connection over the duration of the file further to add to my worries is that it connects, sends the data then reads all through fsockopen, fwrite and fread. If this currently uses the hard drive, it is going to be a much worse latency than just hitting the hard disk on a local server or the database that already has the connection opened.

    Sure some items were offloaded to a separate server but I really do not see the performance benefit to this but rather a drawback due to the extra work that is being processed.

  9. I know what you mean about trusting what apears to be a dead project.. Ideally I’d like to see a fork/extension of memcached which backs on to disk (I don’t know if there already is one, I haven’t looked in a long while). That’s an interesting point about keeping the connection open across transactions, I’ll have a look into that, thanks.
    With regard to the latency, I think you’re missunderstanding what sharedance does. When you write to sharedance it doesn’t block until it’s written to disk, it writes into memory and writes it to disk in the background, so it’s still pretty damned quick. It is quite a bit slower than memcached though. The real benifit of phpdance is the redundancy (which can be implemented with memcached but the backing on to disk helps out with blips). If you have a cluster of 5 nodes in your caching architecture and one of them goes down you’ve potentially corrupted your entire cache. People browsing on servers which are not affected will suddenly loose their sessions. What phpdance allows you to do is trade a little performance for some reliability by writing to more than one node so that a node can drop off the cluster without causing problems. If you have a good load balancer, even people who were browsing on that node won’t notice.

  10. $cache->store($categories, ‘categories’);

    I cant find “store” method in Cache_Lite manual. True method – Cache_Lite::save().

  11. @Andrey –
    Thanks, I’ve updated the post. I must have missed that when running through the post :)

  12. Herve permalink

    With Pear Cache_lite, this kind of examples should not exist anymore :

    if (!($categories = $cache->get(‘categories’))) {

    You can have 0 as a result of an Sql query or from anything else and in this case, the test is wrong.
    In your test, you should use a strict comparaision like :
    if ($Cache_Lite->get($id) === false) {
    }

  13. @Herve –
    That is true, however, this is about knowing your data set as well as the results it is returning. I updated the examples based on your feed back. However:

    if ($Cache_Lite->get($id) === false) {

    Is in efficient because then you would also have to do that same thing again to retrieve the value. The following would be better (however, less readable):

    if (($var = $Cache_Lite->get($id)) === false) {

    You could certainly write the above in 2 lines but I like the single line for simplicity sakes. Here is the two line version:

    $var = $Cache_Lite->get($id);
    if ($var === false) {

  14. Chantu permalink

    Dear Sir

    Thank you for the lovely and excellent article about caching.

    It brings in all technologies of PHP Caching at one place which makes newbies like me learn so much about caching

    Thanks
    Chantu

  15. which one is better? Caching “website page’s blocks” on file system or in Database?

  16. Cacheing and reading from PHP is not bad, but using more advanced methods and going around PHP can bring you a much better performance. I did a benchmark on my blog there I compared reading cache from PHP and reading cache from Apache, you can read more about it here:
    http://sven.webiny.com/advanced-cache-mechanism-using-php-cpp-and-apache/

  17. A technique I use, in a similar vein, is caching data retrieved from the database for the lifetime of a user session. It sounds simple, in that you just assign the dataset to a variable, but utilising Singleton classes or static properties in a nonstatic class are the way I achieve it. I find it results in a massive performance increase when your application is making multiple tiny calls to a highly dynamic DB, or is heavily riddled with design-by-contract elements. A good example of this would be where you’d need to check is User X has Right Y. There might be 15 places instances in one operation where you have to perform a check like this, but if you store the DB result in a static property of a class, it’s much quicker to check the value of the property than make 15 round-trips to the DB server.

Trackbacks & Pingbacks

  1. Suburban Chicago PHP » Blog Archive » PHP Performance Series
  2. PHPDeveloper.org
  3. Mike Willbanks’ Blog: PHP Performance Series: Caching Techniques | Development Blog With Code Updates : Developercast.com
  4. Basic Thinking Blog | Wochenende
  5. Shared Items - December 27, 2008 | Mahmoud M. Abdel-Fattah

Leave a Reply

Note: XHTML is allowed. Your email address will never be published.

Subscribe to this comment feed via RSS