Performance Tuning Overview

by Mike Willbanks on January 31st, 2008

Introduction

During a conversation on the TCPHP mailing list yesterday about frameworks and scalability I wrote a general reply on performance tuning for larger sites. The focus of this post is not to show performance related items to specific PHP frameworks since many bottlenecks actually apply before running the framework itself that should certainly be solved up front. Therefore in this posting I attempt to look at simple items that can be deployed in order to produce finer tuned systems.

I don’t believe I need to state why performance is important as we all know performance can make or break a site, application or even a business. To state up front; I do not deploy all of the methods listed here on every server, site and/or application as it truly depends on what I am expecting with traffic levels, server resource levels, etc.

PHP Performance Tuning

There are many methods available to help you performance tune PHP. Some require little to no effort, while others take more time in analysis for potential opportunities.

Opcode Caching

The number one item to start tuning is to ensure that you are using an opcode cache. Opcode caching will cache and optimize PHP intermediate code, of which will give you a very large performance gain in comparison to straight PHP.

Your options here include APC, XCache, EAccelerator and Zend Platform. I recommend APC and XCache, you decide.

APC File Priming

If you are utilizing APC a great way during a release is to prime the file which will store the file in the bytecode cache bypassing all of the filters. This is done by running each file through apc_compile_file. Now you can create your cache quickly and effectively. Here is an example PHP file to run this (you have to utilize apc.enable_cli=1 if you want to make this run from the CLI). The script below doesn’t contain much checking on your directory so you may want to add that in if you’d like.

if (!function_exists('apc_compile_file')) {
    echo "ERROR: apc_compile_file does not exist!";
    exit(1);
}
 
 
/**
 * Compile Files for APC
 * The function runs through each directory and
 * compiles each *.php file through apc_compile_file
 * @param string $dir start directory
 * @return void
 */
function compile_files($dir)
{
    $dirs = glob($dir . DIRECTORY_SEPARATOR . '*', GLOB_ONLYDIR);
    if (is_array($dirs) && count($dirs) > 0) {
        while(list(,$v) = each($dirs)) {
            compile_files($v);
        }
    }
 
    $files = glob($dir . DIRECTORY_SEPARATOR . '*.php');
    if (is_array($files) && count($files) > 0) {
        while(list(,$v) = each($files))
        {
            apc_compile_file($v);
        }
    }
}
 
compile_files('/path/to/dir');

Includes

The number of files you include can certainly limit your performance. While this is minimal, when you start to include a large number of files this starts to gain. When you are running a page, remember, that every time your page runs it has to include those files. If you are including a large number of files that you typically utilize in a bootstrap, merge those files at deploy time to create a single include file that is used through your bootstrap. Please note, that I am not stating to do this in development as that could lead to a maintenance nightmare but creating a release process where you would do this before deployment.

Avoid Loops when Possible

Loops can be expensive especially the more operations that you run in that loop. For instance take the following scenario where you need to retrieve a set of integer values from the browser that correspond with a particular record in that database and then output the row.

Poor Performance Looping Example

if (isset($_GET['ids'])) {
    foreach($_GET['ids'] as $id) {
        $rs = mysql_query('SELECT * FROM my_table WHERE my_id = ' . (int) $id);
        $row = mysql_fetch_assoc($rs);
        print_r($row);
    }
}

High Performance Looping Example

if (isset($_GET['ids'])) {
    $ids = array_map('intval', $_GET['ids']);
    $ids = implode(',', $ids);
    $rs = mysql_query('SELECT * FROM my_table WHERE my_id IN (' . $ids . ')');
    while($row = mysql_fetch_assoc($rs)) {
        print_r($row);
    }
}

The examples above simulate a common mistake you see often in code. The first example shows us executing the same query per every ID where in the second we only use one query. This cuts down considerably in the time it takes. To generate the below I passed 15 id’s to hit a single table that had only 40 records (6 id’s did not actually exist).

Number of IDS: 15
The Bad Way: 0.0044221878051758 seconds
The Good Way: 0.0011670589447021 seconds

Add on quite a bit more id values and a few joins in there and you have yourself a bottleneck building up.

memcache

memcache is a distributed memory object caching system to which we have a memcache extension in PHP. You can utilize memcache to store certain aspects of your data that is utilized often. Typically, you will store database results within memcache for quick access to ever changing content. If you are thinking about the query caching that RDBMS especially with MySQL 4.x read the section on memcaches front page.

Database Connections

Many applications connect to the database starting in the main file. You should not be doing this as the initial connection to the database still takes time. While it may be minor if you have quite a bit of content that you do not need to connect to the database you are saving your server precious resources and your database from having connections that do nothing. One way to implement a change like this is to use or create an abstraction layer modifying it to only connect on the first command sent to the database.

RDBMS Performance Tuning

I am only going to say a few pointers here as each RDBMS implementation varies but there is general information that you should tend to follow.

Ensure Queries are using Indexes

This is the number one reason for poor performance on a database besides terrible data models is that of poor indexes. You should be hitting your indexes with just about every query. Check your explain plan, ensure that it is using the proper indexes and if they are not adjust it to ensure the speed is going to be adequate.

Further, ensure that you have enough memory to handle large indexes and more than just one at a time.

Use Less Joins

The more joins you have to make the harder you are making the database work. This slows the query down from being extremely fast to slowly taking on a load. You shouldn’t be joining 10 tables together on a page that is utilized a mass amount of the time.

Query Caching

Make sure to make usage of the query caching on your database this will increase performance for your queries that are run more often.

Archive, Archive, Archive

Archiving old data is a huge thing to make use of. Especially in tables that are ever growing and are getting into the millions of records. Archiving data that is used less often will speed up your query time since it will reduce the amount that is stored in the indexes as well as limiting the number of rows you are potentially scanning through.

Many larger companies when archiving old data will produce aggregate statistics on this archived data if it is going to be of use for the customer, client, etc.

HTTP Performance Tuning

As with RDBMS there are multiple web servers so here I will simply give some brief tips considering a few different areas. If you are looking for specific tuning such as apache there are articles scattered throughout the web and information on the Apache website to give you more details.

Content Compression

Utilizing GZIP to send your files to the browser you can effectively slim the file sizes down cutting down on your bandwidth, the amount of time the server needs to be connected with the user and increase the user experience by faster loading times.

Combining Files

To cut down on the amount of requests your web server needs to handle you can effectively combine most used CSS files and JavaScript files. Here you take all of the CSS files you utilize combine them and strip out the whitespace. This will save the user time when downloading the CSS file as well as saving you bandwidth. Do the same with your JavaScript files thus cutting down wasted server resources, bandwidth, etc.

CSS Sprites

CSS Sprites is the process of taking an image that you would have spliced or a set of icons and creating a single image to then use CSS to apply showing the correct image. Take a look at the CSS Sprites article on A List Apart.

Utilize Not Modified Headers

If your content has not changed, you should be sending not modified headers and/or last modified date headers. You want the browser to cache information that isn’t changing so that they do not waste your resources. A browser will only look at the headers if the user has a cache of the content and it has not been modified saving you bandwidth, server resources and the user time.

Limit Modules

The more modules you utilize the more memory and resources your web server will take up. If you do not use a module, disable it or remove it. Secondly a technique to perform better is to have 2 web servers, one for your dynamic content and one for the static content so that there is no overhead of loading PHP with the web server for each instance.

exit(0);

Now as I stated this was a very simplistic overview of some performance tuning options and does not take it to the nth degree of what you can do as there is always more performance tuning that can be done. Further there are many tools to benchmark the impact that each of these are making on your site but that is out of the scope of this post. If you have any suggestions or items I did not cover here simply write a comment.

From PHP

12 Comments
  1. Hi, great piece! I wrote an article about caching in PHP yesterday, what a coincidence. If anyone is interested take a look at http://www.ibuildings.nl/blog/

  2. With regard to the database advice above, connecting to the database in the main index.php file is perfectly acceptable if you use an appropriate DB abstraction layer (you do use an abstraction layer, don’t you?) like MDB2 and lazy connect (which does not actually do the connection work until the database is actually used):

    require_once(“MDB2.php”);
    $db = MDB2::factory($dsn);
    if(PEAR::isError($db)) {
    die(“Error while connecting : ” . $db->getMessage());
    }

    And on the point of reducing the number of joins and keeping database request simple, while this isn’t a bad suggestion, a better one would be to write a stored procedure (assuming you’re using PostgreSQL or an up-to-date MySQL or SQL server or similar). Because stored procedures are compiled by the database server, you gain back all the time otherwise lose in parsing the SQL code. It also means that you keep more of the SQL out of the PHP, which is generally a good idea.

    Database choice also has an impact here. It’s worth remembering that while only MySQL has a query cache, it doesn’t cache every query (queries must be identical to get the best speedup and there are whole classes of query which are not cached, and the cache has to be properly tuned as well); it’s worth even more to remember that once your number of simultaneous users rises above 1, PostgreSQL performs better than MySQL…

  3. Mark,

    I do use a DB abstraction layer that does this. The comment was simply to force the lazy connecting which the abstraction layer is doing. That was simply my point :)

    I will disagree with you on stored procedures. I think it depends on what you are doing with them. Logic should be contained in the application, not the database. However, I do understand the need and it is a life long argument that will end up no where. :)

    Do you have anything to back up your findings about PostgreSQL and MySQL in that last comment? Both servers running the same pieces of software, and then tuned for the best performance by both sides?

  4. If you are a good at sys admin and you happen to compile PHP from source you can also compile a profiled (-fprofile-generate -fprofile-use) PHP for approximately 30% performance improvement (just do a make test and run the benchmark in the Zend folder between each compile), oh, then theres using apache 2.2.8 with the event mpm and fastcgi php, mod_deflate anybody? oh and making sure you don’t use the stock filesystem session setup with php as every page that you hit it will rewrite the files entirely even if no data changed and you happen to have called session_start()… there are a million things that you can do when things hit the fan IF you are good!

  5. Impressive compilation on how to optimize PHP web sites.
    Thank you very much for the effort.
    BTW, anyone knows what happened to TurckMMCache? At some point in time, it was even faster than APC.

  6. @Guti –
    The Turck MMCache’s last release was 2003-11-04. In December of 2004 the project was forked by eAccelerator.

  7. I have yet to see a good comment/test around reducing include calls and function overhead.

    IE: By including all functions in one file, you make only one call to the filesystem to include it, but you are not instantiating all those functions (which you may or may not need).

    I’d love to know what is the crossover point?

  8. @TuxLives –
    The answer is relatively simple here. If you are already including all of those files and/or functions to include them you are already incurring that overhead except, slightly more overhead since it also takes the hit from disk.

    This becomes easier to see when you are including a bunch of files from a framework. Since PHP is not compiled, you are actually having to include those files on each and every request. For one of my installations of the Zend Framework it is including up to 85 files. Reducing this to 1-3 files speeds up the site drastically. However, this is a little tricky as you have to work out all of the dependencies so that if class b implements class a you have to include class a first (something you can’t easily do with get_required_files() function.

    Each individual benchmark would be considerably different since you may have more or less functions than ones that would be tested or more or less files with class definitions. What I attempt to do, is to group the common ones I utilize on every request into a single file and then deal with the rest on a one off basis.

Trackbacks & Pingbacks

  1. IT-Republik-
  2. PHPDeveloper.org
  3. More about Performance Tuning | Stuart Herbert On PHP
  4. PHP Performance Series: Caching Techniques | Mahmoud M. Abdel-Fattah

Leave a Reply

Note: XHTML is allowed. Your email address will never be published.

Subscribe to this comment feed via RSS