PHP Performance Series: Caching Techniques
February 27, 2008
Welcome to the first edition of the PHP performance series, a new series that I will be explaining ways to gain efficiencies and squeezing more performance out of your applications. This first edition, caching techniques, focuses on ways to cache data to optimize your current sites. Some of the concepts here are fairly easy to implement while others may take strategic design in the architecture of your application. Whether you are working on a high profile web application or simply a web development farm these concepts apply to the masses.
Opcode Caching
Opcode caching is likely one of the most simple and effective ways of increasing performance in PHP. By utilizing an Opcode cache you will eliminate many unneeded inefficiencies that happen during the execution process. Opcode caches solve this by storing the opcodes in memory in order to not compile files on each step in the process.
There are many opcode caches available for consumption. You have APC, XCache, eAccelerator and Zend Platform. You make your choice up of what you like the best as they all have advantages and disadvantages which is out of the scope of this article.
File Priming
This is typically more relevant to larger scale companies that have release processes. When you are pushing out a new release, typically you do not want to have your caching system waiting until each page is hit until it is processed in the opcode cache. Instead what can be done, is to run a utility script after the release is pushed out to run each file through the opcode caching extensions compile function. There is an example of this on my performance overview post which has a section about file priming for APC Each of the different opcode caches typically have a way to prime the files, so just look into the API documents.
Caching Variables
Many opcode caches also allow for you to place variable data, also known as user land data, into the cache (typically in memory). This is useful for storing your configuration values or data that is expensive to get and will likely not change.
Example: APC Variables
if (($config = apc_fetch('config')) === false) { require('/path/to/includes/config.php'); apc_store('config', $config); }
A practical example of this was using the Zend Framework and simply running an ab bench after storing the results of the XML configuration file in the cache. This saved parsing time as well as extremely quick access to the configuration file.
Figure: APC Variables in Use
The Code
if (($conf = apc_fetch('pbs_config')) === false) { $conf = new Zend_Config_Xml(PB_PATH_CONF . '/base.xml', 'production'); apc_store('pbs_config', $conf); }
The Benchmark Command
ab -t30 -c5 http://www.example.com/
Results Without The APC Variable
Concurrency Level: 5
Time taken for tests: 30.33144 seconds
Complete requests: 684
Failed requests: 0
Write errors: 0
Results With The APC Variable
Concurrency Level: 5
Time taken for tests: 30.12173 seconds
Complete requests: 709
Failed requests: 0
Write errors: 0
As you could see we had approximately a 3-4% gain in performance by simply caching our configuration file. There is many other areas that could be added into these areas of memory thus increasing your overall performance. Find a few of these and you will certainly see increases in the amount of requests handled. Note that the server that is being tested on is an older box and including a mass amount of files using the Zend Framework.
Make sure to check the documentation on each opcode cache to ensure what you can store in the variable scope (some will not support automatically serializing objects so be careful). Further, ensure you have enough memory allocated in order to do this in specific areas. Lastly I did not include the other op code cache examples here; I simply wanted to give an example to show what common usage would be like.
File Caching
Many times there are areas where the server is processing the same page of content that has not changed. There are always opportunities to cache this type of content, whether in part or in full. I’ll attempt to address both areas here from a simplistic point of view, rather than discussing techniques of generating static content that could be utilized by running a static web server.
For the sake of time and being practical with pre-existing tools, I will be showing the examples in the Pear::Cache_Lite package.
Full File Caching
Full file caching is rather hard to achieve on many different sites when we are pulling data for different reasons and sometimes from different sources. However, while that may be true, there are certainly cases where you do not need to have the “most” up to date data available at that very second. Even a 5-10 minute delay on extremely high traffic sites will award you a performance increase. It is always good to ensure that you are checking your site for these types of areas and creating an easy way to allow for future modification.
While you always have to come at caching with different angles, this is quite possibly the quickest way to add it in and is certainly not flawless. The following example simply takes a snapshot of the page and stores it for use again. This is not a complete logical approach but may be good for certain users.
I do not recommend this for a long term solution but if you need something that is short term and this meets your needs, implement if you like but sooner or later you will see the drawbacks to this method. Such as no content is ever dynamic or certain pieces of content need to be updated sooner than others.
The Bootstrap Cache Example:
require('/path/to/pear/Cache/Lite/Output.php'); $options = array( 'cacheDir' => '/tmp/', 'lifeTime' => 10 ); $cache = new Cache_Lite_Output($options); if (!($cache->start($_SERVER['REQUEST_URI']))) { require('/path/to/bootstrap.php'); $cache->end(); }
The .htaccess Cache Example:
.htaccess
php_value auto_prepend_file /path/to/cache_start.php
php_value auto_append_file /path/to/cache_end.php
cache_start.php
require('Cache/Lite/Output.php'); $options = array( 'cacheDir' => '/tmp/', 'lifeTime' => 10 ); $cache = new Cache_Lite_Output($options); if (($cache->start($_SERVER['REQUEST_URI']))) { exit; }
cache_end.php
$cache->end();
Cache Lite does a lot of the heavy work for you such as file locking, deciding on how to save the content through the parameter given (here we are just using the REQUEST URI). You may need to take in consideration the $_POST variables, $_COOKIE variables or even the $_SESSION variables depending on what you are attempting to achieve.
Partial File Caching
Partial file caching is typically the route that you will likely see the most benefits overall. You likely have quite a bit of content that does not need to be real-time, however, you would like it to be updated once in a while. Or secondly, you have specific portions of the site that simply do not need to be updated at all. This is where the partial caching comes in and really allows you to see quite a bit of performance gains across the board.
Caching Contents Of A String
require('Cache/Lite.php'); $options = array( 'cacheDir' => '/tmp/', 'lifeTime' => 3600 //1 hour ); $cache = new Cache_Lite($options); if (($categories = $cache->get('categories')) === false) { $rs = mysql_query('SELECT category_id, category_name FROM category'); $categories = '<ul class="category">'; while($row = mysql_fetch_assoc($rs)) { $categories .= '<li><a href="category.php?id=' . $row['category_id'] . '">' . $row['category_name'] . '</a></li>'; } $categories .= '</ul>'; $cache->save($categories, 'categories'); } echo $categories;
While this is a highly simplistic example, it shows the flexibility to store contents. You could even store an array instead in order to cycle through it at a later time.
Caching An Array Of Results
require('Cache/Lite.php'); $options = array( 'cacheDir' => '/tmp/', 'lifeTime' => 3600, //1 hour 'automaticSerialization' => true ); $cache = new Cache_Lite($options); if (($categories = $cache->get('categories')) === false) { $rs = mysql_query('SELECT category_id, category_name FROM category'); $categories = array(); while($row = mysql_fetch_assoc($rs)) { $categories[] = $row; } $cache->save($categories, 'categories'); } var_dump($categories);
As you can see, you can store different types of data through the cache. However, with file caching I would be reluctant to store database data as there are better solutions for that type of role which I will be talking about shortly.
Memory Caching
There are a few different ways to produce caches in memory including: memcached, database memory tables, utilizing RAM disk and another option is using the opcode caches memory caching from the beginning of this article. It is best to keep things in memory that are utilized most often and often have a small footprint.
Memcached
From the memcached website:
memcached is a high-performance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load.
Essentially what this is saying is that it is able to be stored on a central server with many servers accessing it, it is not tied into your web server such as an opcode cache as it runs it’s own daemon and it is typically utilized for caching database results (doesn’t mean there aren’t additional things it is good for such as session handling — it is already integrated if you just cache the session handler to “memcache” and change your session.save_path to the server with memcached).
Memcache Example
$post_id = (int) $_GET['post_id']; $memcached = new Memcache; $memcached->connect('hostname', 11211); if (($row = $memcached->get('post_id_' . $post_id)) === false) { //yes this is safe, we type casted it already ;) $rs = mysql_query('SELECT * FROM post WHERE post_id = ' . $post_id); if ($rs && mysql_num_rows($rs) > 0) { $row = mysql_fetch_assoc($rs); // cache compressed for 1 hour $memcached->set('post_id_' . $post_id, $row, MEMCACHE_COMPRESSED, time() + 3600); } } var_dump($row);
This is a fairly typical example of memcached. We stored a single item in memory for future usage that might be accessed quite a bit. I recommend using this for records that are accessed the most, thats what a cache is all about.
Memcache Session Example
session.save_handler = memcache session.save_path = "tcp://hostname:11211"
As you can see session handling is quite easy. For multiple memcached servers comma separate the save_path value with each server.
Database Memory Tables
Database memory tables, while I am not going to give an example, can be useful for session data. You can easily create a table with the storage engine of memory using MySQL. Create your own session handler and provide the data that way. This is a quick way to boost performance on sessions as well as keeping them distributed between multiple web servers. Personally, if you can, I would go the memcached route to keep the load off of the database server and let it work on serving other requests.
RAM Disk
While utilizing your RAM as a disk is not distributed it can easily be a quick adjustment to make your site perform faster. However, you might want to note the amount of memory you are going to be utilizing and ensure that on reboot that this directory is put back on the RAM Disk. Remember that information placed in RAM is lost on reboot or power failure.
Bind RAM to a Directory
mount --bind -ttmpfs /path/to/site/tmp /path/to/site/tmp
I attempt to avoid this route as I believe that the risk outweighs the gains, unless you are dealing with massive servers. But there are better tools such as memcached that I would trust more.
exit(0);
I hope that this was informative to some of you regarding caching techniques in PHP. I didn’t fully cover all of the potential caching techniques such as database caching that the RDBMS’s do and some of the other items such as Squid. I may cover more of these at a later time, if I attempt to get into it all now this post will never see the light of day. If you have anything to add send in a comment. Please note, I do not deploy these tactics on everything and anything but decide on certain logistics when and where these need to be implemented. Take into consideration the scale of the project, current overall impact and if you are just optimizing it just for the sake of doing it.



