Recent Articles

Recent Comments

Resources

PHP Performance Series: Caching Techniques

February 27, 2008

Welcome to the first edition of the PHP performance series, a new series that I will be explaining ways to gain efficiencies and squeezing more performance out of your applications. This first edition, caching techniques, focuses on ways to cache data to optimize your current sites. Some of the concepts here are fairly easy to implement while others may take strategic design in the architecture of your application. Whether you are working on a high profile web application or simply a web development farm these concepts apply to the masses.

Opcode Caching

Opcode caching is likely one of the most simple and effective ways of increasing performance in PHP. By utilizing an Opcode cache you will eliminate many unneeded inefficiencies that happen during the execution process. Opcode caches solve this by storing the opcodes in memory in order to not compile files on each step in the process.

There are many opcode caches available for consumption. You have APC, XCache, eAccelerator and Zend Platform. You make your choice up of what you like the best as they all have advantages and disadvantages which is out of the scope of this article.

File Priming

This is typically more relevant to larger scale companies that have release processes. When you are pushing out a new release, typically you do not want to have your caching system waiting until each page is hit until it is processed in the opcode cache. Instead what can be done, is to run a utility script after the release is pushed out to run each file through the opcode caching extensions compile function. There is an example of this on my performance overview post which has a section about file priming for APC Each of the different opcode caches typically have a way to prime the files, so just look into the API documents.

Caching Variables

Many opcode caches also allow for you to place variable data, also known as user land data, into the cache (typically in memory). This is useful for storing your configuration values or data that is expensive to get and will likely not change.

Example: APC Variables

if (($config = apc_fetch('config')) === false) {
    require('/path/to/includes/config.php');
    apc_store('config', $config);
}

A practical example of this was using the Zend Framework and simply running an ab bench after storing the results of the XML configuration file in the cache. This saved parsing time as well as extremely quick access to the configuration file.

Figure: APC Variables in Use

The Code

if (($conf = apc_fetch('pbs_config')) === false) {
    $conf = new Zend_Config_Xml(PB_PATH_CONF . '/base.xml', 'production');
    apc_store('pbs_config', $conf);
}

The Benchmark Command
ab -t30 -c5 http://www.example.com/

Results Without The APC Variable

Concurrency Level: 5
Time taken for tests: 30.33144 seconds
Complete requests: 684
Failed requests: 0
Write errors: 0

Results With The APC Variable

Concurrency Level: 5
Time taken for tests: 30.12173 seconds
Complete requests: 709
Failed requests: 0
Write errors: 0

As you could see we had approximately a 3-4% gain in performance by simply caching our configuration file. There is many other areas that could be added into these areas of memory thus increasing your overall performance. Find a few of these and you will certainly see increases in the amount of requests handled. Note that the server that is being tested on is an older box and including a mass amount of files using the Zend Framework.

Make sure to check the documentation on each opcode cache to ensure what you can store in the variable scope (some will not support automatically serializing objects so be careful). Further, ensure you have enough memory allocated in order to do this in specific areas. Lastly I did not include the other op code cache examples here; I simply wanted to give an example to show what common usage would be like.

File Caching

Many times there are areas where the server is processing the same page of content that has not changed. There are always opportunities to cache this type of content, whether in part or in full. I’ll attempt to address both areas here from a simplistic point of view, rather than discussing techniques of generating static content that could be utilized by running a static web server.

For the sake of time and being practical with pre-existing tools, I will be showing the examples in the Pear::Cache_Lite package.

Full File Caching

Full file caching is rather hard to achieve on many different sites when we are pulling data for different reasons and sometimes from different sources. However, while that may be true, there are certainly cases where you do not need to have the “most” up to date data available at that very second. Even a 5-10 minute delay on extremely high traffic sites will award you a performance increase. It is always good to ensure that you are checking your site for these types of areas and creating an easy way to allow for future modification.

While you always have to come at caching with different angles, this is quite possibly the quickest way to add it in and is certainly not flawless. The following example simply takes a snapshot of the page and stores it for use again. This is not a complete logical approach but may be good for certain users.

I do not recommend this for a long term solution but if you need something that is short term and this meets your needs, implement if you like but sooner or later you will see the drawbacks to this method. Such as no content is ever dynamic or certain pieces of content need to be updated sooner than others.

The Bootstrap Cache Example:

require('/path/to/pear/Cache/Lite/Output.php');
 
$options = array(
	'cacheDir' => '/tmp/',
	'lifeTime' => 10
);
 
$cache = new Cache_Lite_Output($options);
 
if (!($cache->start($_SERVER['REQUEST_URI']))) {
	require('/path/to/bootstrap.php');
	$cache->end();
}

The .htaccess Cache Example:

.htaccess
php_value auto_prepend_file /path/to/cache_start.php
php_value auto_append_file /path/to/cache_end.php

cache_start.php

require('Cache/Lite/Output.php');
 
$options = array(
    'cacheDir' => '/tmp/',
    'lifeTime' => 10
);
 
$cache = new Cache_Lite_Output($options);
 
if (($cache->start($_SERVER['REQUEST_URI']))) {
    exit;
}

cache_end.php

$cache->end();

Cache Lite does a lot of the heavy work for you such as file locking, deciding on how to save the content through the parameter given (here we are just using the REQUEST URI). You may need to take in consideration the $_POST variables, $_COOKIE variables or even the $_SESSION variables depending on what you are attempting to achieve.

Partial File Caching

Partial file caching is typically the route that you will likely see the most benefits overall. You likely have quite a bit of content that does not need to be real-time, however, you would like it to be updated once in a while. Or secondly, you have specific portions of the site that simply do not need to be updated at all. This is where the partial caching comes in and really allows you to see quite a bit of performance gains across the board.

Caching Contents Of A String

require('Cache/Lite.php');
$options = array(
    'cacheDir' => '/tmp/',
    'lifeTime' => 3600 //1 hour
);
$cache = new Cache_Lite($options);
if (($categories = $cache->get('categories')) === false) {
    $rs = mysql_query('SELECT category_id, category_name FROM category');
    $categories = '<ul class="category">';
    while($row = mysql_fetch_assoc($rs)) {
        $categories .= '<li><a href="category.php?id=' . $row['category_id'] . '">' .
                                $row['category_name'] . '</a></li>';
    }
    $categories .= '</ul>';
    $cache->save($categories, 'categories');
}
echo $categories;

While this is a highly simplistic example, it shows the flexibility to store contents. You could even store an array instead in order to cycle through it at a later time.

Caching An Array Of Results

require('Cache/Lite.php');
$options = array(
    'cacheDir' => '/tmp/',
    'lifeTime' => 3600, //1 hour
    'automaticSerialization' => true
);
$cache = new Cache_Lite($options);
if (($categories = $cache->get('categories')) === false) {
    $rs = mysql_query('SELECT category_id, category_name FROM category');
    $categories = array();
    while($row = mysql_fetch_assoc($rs)) {
        $categories[] = $row;
    }
    $cache->save($categories, 'categories');
}
var_dump($categories);

As you can see, you can store different types of data through the cache. However, with file caching I would be reluctant to store database data as there are better solutions for that type of role which I will be talking about shortly.

Memory Caching

There are a few different ways to produce caches in memory including: memcached, database memory tables, utilizing RAM disk and another option is using the opcode caches memory caching from the beginning of this article. It is best to keep things in memory that are utilized most often and often have a small footprint.

Memcached

From the memcached website:

memcached is a high-performance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load.

Essentially what this is saying is that it is able to be stored on a central server with many servers accessing it, it is not tied into your web server such as an opcode cache as it runs it’s own daemon and it is typically utilized for caching database results (doesn’t mean there aren’t additional things it is good for such as session handling — it is already integrated if you just cache the session handler to “memcache” and change your session.save_path to the server with memcached).

Memcache Example

$post_id = (int) $_GET['post_id'];
$memcached = new Memcache;
$memcached->connect('hostname', 11211);
if (($row = $memcached->get('post_id_' . $post_id)) === false) {
    //yes this is safe, we type casted it already ;)
    $rs = mysql_query('SELECT * FROM post WHERE post_id = ' . $post_id);
    if ($rs && mysql_num_rows($rs) > 0) {
        $row = mysql_fetch_assoc($rs);
        // cache compressed for 1 hour
        $memcached->set('post_id_' . $post_id, $row, MEMCACHE_COMPRESSED, time() + 3600);
    }
}
var_dump($row);

This is a fairly typical example of memcached. We stored a single item in memory for future usage that might be accessed quite a bit. I recommend using this for records that are accessed the most, thats what a cache is all about.

Memcache Session Example

session.save_handler = memcache
session.save_path    = "tcp://hostname:11211"

As you can see session handling is quite easy. For multiple memcached servers comma separate the save_path value with each server.

Database Memory Tables

Database memory tables, while I am not going to give an example, can be useful for session data. You can easily create a table with the storage engine of memory using MySQL. Create your own session handler and provide the data that way. This is a quick way to boost performance on sessions as well as keeping them distributed between multiple web servers. Personally, if you can, I would go the memcached route to keep the load off of the database server and let it work on serving other requests.

RAM Disk

While utilizing your RAM as a disk is not distributed it can easily be a quick adjustment to make your site perform faster. However, you might want to note the amount of memory you are going to be utilizing and ensure that on reboot that this directory is put back on the RAM Disk. Remember that information placed in RAM is lost on reboot or power failure.

Bind RAM to a Directory

mount --bind -ttmpfs /path/to/site/tmp /path/to/site/tmp

I attempt to avoid this route as I believe that the risk outweighs the gains, unless you are dealing with massive servers. But there are better tools such as memcached that I would trust more.

exit(0);

I hope that this was informative to some of you regarding caching techniques in PHP. I didn’t fully cover all of the potential caching techniques such as database caching that the RDBMS’s do and some of the other items such as Squid. I may cover more of these at a later time, if I attempt to get into it all now this post will never see the light of day. If you have anything to add send in a comment. Please note, I do not deploy these tactics on everything and anything but decide on certain logistics when and where these need to be implemented. Take into consideration the scale of the project, current overall impact and if you are just optimizing it just for the sake of doing it.

18 Comments

Over Engineering Software

February 17, 2008

Many times as developers, we tend to take our projects and over engineer them since we foresee most of the features that we may want in the future even although there is no purpose for it quite yet. This is quite a hindrance to actually ever getting the software we develop out the door as we can continue refactoring, extending and making our software more complex for what the end-users of a system might need in the future or even potentially edge cases that will never hit 90% of the population.

I have come to the distinct conclusion that when we start to over engineer a project, the chances of it ever getting out the door get slimmer by the second. I am not saying that it is bad to look at what is going to come in the future, however, we need to first develop for the need at hand and extend later by not coding ourselves into a box. This would mean ensuring that your data model, code structure and business logistics fall in line in a way where if things change you aren’t consistently having to modify mass amounts of code. Remember, OOP (Object Oriented Programming) is your friend.

Lets take for instance authentication, authorization and privileges. Presently, your software needs a single login, no groups or specific privileges are needed. You know that you will need this in the future but why develop it now? Simply create it in a way that is expandable in the future and move on. Keep it on a future to do list and cross the bridge when it is actually essential. This might not be the best example as many large applications that we develop may need this out of box. If we look specifically at the development time when you are creating all of this it would be simple to create the architecture behind it and implement it later.

A practice that I will sometimes use in my code, is if I know I am going to be building it out in the future, is to build the class definitions and having them return true. Say we are going to implement groups in the future but at the current moment they are not needed. To go through each section of code in the future implementing the checks would take more time than just implementing it in the beginning. Now with the class definition and the method returning true you could simply call a method to check if the group or user has access to the page (which at this point will return true) and when you implement the business logistics later there is no need to create a large scale change. Yet again, as I stated prior authorization and authentication may not be a specifically great example here.

You will have likely taken slightly more time but in the end saved a mass amount of time by doing this. You certainly will likely miss some areas with that base check but everything should be golden at this stage and you saved yourself hours of writing the business logic. This certainly doesn’t apply to every feature you think of because that would be madness and that would completely negate the purpose of this post.

A simple way to gauge if you should develop now or later; do all of these apply?

  1. Is this critical to my users success?
  2. Is the application crippled (unusable) with out this feature?
  3. Is the cost to develop it now 50-75% less than it would be in the future?
  4. Is there a business need to support this feature?

I believe these are all critical questions to ask yourself when you are developing, otherwise, you may end up with 20 unfinished projects that will never see the light of the day because the enthusiasm of starting the project has diminished and since there is nothing out the door since it is unfinished there is no community to help build your enthusiasm about the project. At this state you are burnt out, bored and that project may never see the light of day again regardless of how great of an idea it was in the first place or what solutions it may have solved.

4 Comments