PHP Performance Series: Maximizing Your MySQL Database

by Mike Willbanks on June 18th, 2008

In the first article of the PHP Performance Series, I focused on PHP Caching Techniques. This time I want to talk about maximizing your database. This article will deal mostly with MySQL, however, you should be able to note many of the different aspects even if you do not directly utilize MySQL.

Application SQL Performance

Application level SQL performance is much different than the performance of the SQL query itself but rather how it has been designed to work in the application. Many of the items I will be addressing in this area is designing your application to make less queries thus improving scalability and likely performance. However, performance does not always equal scalability as the same with scalability does not always equal performance.

If you have read my blog before you may notice that I have used some of this content before but putting in this section for terms of completeness.

Lazy Connections

Utilizing lazy connections for your database is a great step in applications that do not need to utilize the database through a full request or even if it needs to be utilized at all. The concept here is to not initialize the connection to the database unless absolutely essential to keep your connection pool free of massive amounts of sleeping connections.

Simple Lazy Connection Example

While not a full example, I believe this shows you a simple technique in handling lazy connections.

class My_Db {
 
	private $_connected = false;
	private $_connection;
 
	public function connect($host, $user, $pass, $db) {
		//method will simply set the connection variables
	}
	private function _connect() {
		if ($this->_connection = mysql_connect()) {
			$this->_connected = true;
		}
	}
	public function query($query) {
		if (!$this->_connected) {
			$this->_connect();
		}
		mysql_query($query, $this->_connection);
	}
}

Iterating Queries

This is one of the most common items I usually see when looking over another developers code or even several of the open source projects out there. I am defining iterating queries as a query that executes on a loop. These can be very expensive and often times are definitely not needed.

An Iterating Query Example

if (isset($_GET['ids'])) {
    foreach($_GET['ids'] as $id) {
        $rs = mysql_query('SELECT * FROM my_table WHERE my_id = ' . (int) $id);
        $row = mysql_fetch_assoc($rs);
        print_r($row);
    }
}

Fixing The Iterating Query Example

if (isset($_GET['ids'])) {
    $ids = array_map('intval', $_GET['ids']);
    $ids = implode(',', $ids);
    $rs = mysql_query('SELECT * FROM my_table WHERE my_id IN (' . $ids . ')');
    while($row = mysql_fetch_assoc($rs)) {
        print_r($row);
    }
}

Need Based Selects

There is no need to select information that you do not need. For starters this increases the memory usage on the request as well as the I/O time in fetching all of the record data from the columns that are not being utilized and transferring them through PHP. The more data you select the slower the query and larger the memory footprint. This is especially true with the TEXT and BLOG column types.

SELECT * IS BAD!

This actually is bad in 2 different areas. First and of a higher concern is what is actually being utilized from the query if you ever need to change something in the application? Say you have a large application with thousands of files that utilize a select query and you will have no idea where the variables are without actually researching each and every area of the application that has the wonderful SELECT *. Basically your maintainability slowly dies. Secondly as I stated before, there is also a performance and memory hit here, simply stated, do not use SELECT *.

One question that is commonly asked after this comment is what if I am using all of the columns? Still, are you going to be using all of the columns in 3 months, 6 months, 1 year, 5 years?

Use the Correct Data Type (Don’t Quote Everything)

Yes, utilizing the correct type of data in your query does matter. You can cause the database to miss indexes or come back with invalid results. Besides that aspect it is slower since the database has to convert the data into the correct type.

Example of Incorrect Data Types

if (isset($_GET['id'])) {
    $id = mysql_real_escape_string($_GET['id']);
    $rs = mysql_query("SELECT * FROM my_table WHERE my_id = '{$id}'");
}

Example of Correct Data Types

if (isset($_GET['id'])) {
    $id = (int) $_GET['id'];
    $rs = mysql_query('SELECT * FROM my_table WHERE my_id = ' . $id);
}

By the way, the 2nd one is also quicker because we only had to type cast for the numeric instead of running it through mysql_real_escape_string.

Hierarchical Data

When utilizing navigation trees, you should ensure that you are utilizing a proper technique and not pushing your database for it. Further, cache it instead of hitting it every request at the same time! There is already a great article over at MySQL on Managing Hierarchical Data so I am not going to go deep into this. But anytime you are managing a tree of data please spare other developers and utilize it correctly.

Database Design

Database design is typically the first area that you find problems start. A bad data model can plague your application with both performance and maintainability concerns. However, also note, the more performance driven that you make your database, the less maintainable that it can become (debatable – can really depend on size and scale).

Please note that this is not an all encompassing list as I wanted to give a little more of an idea from a developer point of view on the optimization techniques in database design rather than providing a full guide. If you are looking for a deep level of information I invite you to please go to the MySQL Manual on Database Design Feel free to add recommendations in the comments.

Normalization

Normalization is a technique utilized for minimizing the duplication of information. This is typically the best thing to start out with, if you are currently having problems and your database is not well normalized focus on that first. Likely you have tables, columns or data that shouldn’t need to work as it currently is.

Columns instead of Table Example

A common problem that many applications you find have is that they store the data into just columns instead of actually making a one-to-many table. Take for instance the following table:

app_user
Column Name Column Type
user_id integer
user_email varchar
user_website varchar
user_website_2 varchar
user_website_3 varchar

What you are likely seeing here is a user requested to have up to 3 websites and the developer working on it figured there would never need to be anymore. Now this is a focus for gaining further modularity as well as normalization.

Table instead of Columns Example

Using a table instead of columns would better support this feature as well as allow for further growth in the future.

app_user
Column Name Column Type
user_id integer
user_email varchar
app_user_website
Column Name Column Type
website_id integer
user_id integer
website_url varchar

You might be thinking, what does this have to do with performance? Well it has to deal with maintainability, better handling of your indexes, as well as lowering the table size in the initial table. Size of your tables does matter definitely when you need to start adding more indexes to columns that you might not need to. Take both examples and attempt to find users without a website entered without an index on the user_website columns. This becomes very slow instead of doing a simple select au.user_id from app_user au where au.user_id NOT IN (select distinct user_id from app_user_website) instead of having to add an additional index instead of simply using the foreign key that you would likely have defined in the app_user_website table for the user_id.

Denormalization

Denormalization is the concept of copying data into other tables in order to reduce the amount of joins that you need to create. These should typically always be handled with triggers and if you are unable to do triggers, please ensure that anything that creates, modifies or deletes handles through a layer of encapsulation. Otherwise you will end up with stale records.

You will only need to do this when you have exhausted all of the other routes such as checking and creating indexes, if data access was really needed and lastly there was no other method for the needs that you were going after.

Table Types

Use the correct table type for what you are doing or attempting to do. I suggest setting up a matrix of what you need your table to do as well as your database as a whole. This first step is often neglected by many developers.

Example List
Feature Option
Read vs. Write > 15% Yes/No
Transactions Yes/No
Foreign Key Support Yes/No
Full-Text Indexes Yes/No

The list above is by far no means a full list but you should document what you need and what you are using the database for. For example if you are reading and writing on the same database table, MyISAM is likely a bad idea if you have greater than 15% of reads or writes. MyISAM tables will easily lock when there is a long read and a subsequent insert with a table lock thus not allowing any further reads until that insert has completed. So make a list and figure out what needs to be there. This will certainly help you with furthering your database.

SQL Query Optimizations

Optimizing your SQL to perform is not rocket science, however, this seems to be one area where applications seem to start crumbling down from the point of an application hitting popularity. Simply the features that are developed and time change the database is rarely taken into effect in what might be affected.

A simple rule to follow is when utilizing your database to design your queries to your database architecture and current rules, when those cannot be achieved refactor and adjust the database to be able to handle the new situations.

The Simple Rules

  • Use your explain/execution plan
  • The less joins the better
  • Ensure you are utilizing your indexes (see first bullet)
  • Temporary tables can be good when doing operations on complex data sets
  • Stay away from derived tables and non-materialized views (see above bullet)
  • Roll up data that can be aggregated
  • Select the columns you need, not SELECT *

If you are looking for ways to better optimize your queries, again, please go to the MySQL Manual on Query Optimization

Exit(0);

You may have noticed that this blog post has taken me quite a while to push this out. Besides in the middle of purchasing a home, some resource constraints at work, a couple side projects and maintaining my life I just didn’t have much time to finish writing a more complete post.

To go a little further, I’ve cut down the contents in this blog as you may have been sick of reading already as well as the amount of information that could have been potentially written here could easily have been a full book if you wanted to get into each and every aspect. I figured for my sanity as well as yours I should cut it shorter. If you have any information to add please submit comments.

From MySQL, PHP

14 Comments
  1. Thanks for helpful article, mainly part about lazy connection!
    Good luck with juggling all important things in your life :).

  2. Great Post!

    It’s always great to review the concepts, and sometimes to learn new one’s.

    Will be waiting for the next article on this series!

  3. Joachim Schoder permalink

    You could have written a book and perhaps you should. You could get more into detail and you would have one customer at least.

  4. CyberGhost permalink

    Good article. However, what I am missing here is some information about the mysqli extension and prepared statements. Also, it might be good idea to include a few words about InnoDB vs MyISAM engine, as they work a bit different when it comes to high volumes of data.

  5. CyberGhost –
    That is definitely true… I should have included the differences between the different MySQL extensions such as mysql, mysqli, mysqlnd and pdo_mysql. However, that can get a bit hairy when looking at the differences between all of them. Likely I might dedicate a post to the differences in the future.

    MyISAM vs. InnoDB is mainly the enterprise features that many people will want or need. InnoDB feature set contains transactions, foreign keys, internal id mapping and the like. However, for reading or writing MyISAM is typically always faster but when you are doing high volume in a single database for reads and writes InnoDB would be essential for it’s row level locking vs. MyISAM’s table locking.

    I could have certainly gone into the index portion as well on creating a better index, I suppose there is just a bit too much information to cover in a blog post. :)

  6. Cool Article! I do not completly agree on “Iterating Queries” “IN” does poorly work with indexes and might therefor lock the table for a long time, so the first example might not be slower in any case. Denormalization can causes a lot of troubles, so you should really only use if all other possibility (specialy optimization and indexes) fails. And sometimes I think query optimization is rocket science … ;-)

  7. Leo –
    You should likely double check the logic there… typically IN does a fantastic job certainly where you are using the primary keys. We utilize this quite a bit and rarely ever miss the index. In our benchmarks specifically we seen queries in the iteration go from 300 queries in 2 seconds to 1 query and completed in the sub seconds.

    I believe it really depends on the nature of the beast.

  8. Hi Mike,

    Nice article. The following five points would also be good advice.

    1) Make sure the MySQL query cache is on, and that it has a decent amount of RAM assigned to it.
    2) Make sure you properly index your tables.
    3) The slow query log :)
    4) Add code to your app to monitor the number of queries made per page. Once you know the number, reduce it!
    5) JOINs can be slow, and it can be quicker to do two separate queries instead.

    Best regards,
    Stu

  9. As I said: “might not be slower in any case”. So its pretty hard to say something for every case in mysql. I tried it out again on our database and “IN” uses range and “using where; using index”. But if you have a simple query it might be still faster.

  10. Stuart –
    I had initially wrong in my notepad on what I was writing on to address a few of those concepts… That’s what I get for attempting to finish writing this thing so late. I figured if I didn’t post it last night I wouldn’t be posting it for another month… This article was half written for the last 2 months. :)

  11. CyberGhost permalink

    Mike, thanks for the quick follow-up about MyISAM vs. InnoDB. Also, I didn’t know that the IN() syntax is so fast on primary keys. Suppose we learn new things every day :)

  12. Hello, Altough a great atricle I don’t aggree with your lazzy solution to connecting to the database. Because in the following statement you always create a connection to the database:

    private function _connect() {
    if ($this->_connection = mysql_connect()) {
    $this->_connected = true;
    }
    }

    In every check you connect to the database. It should be something like:

    private function _connect() {
    if (is_null($this->_connection)) {
    $this->_connection = mysql_connect();
    $this->_connected = true;
    }
    }

    *for clarity I skipped the statement to check or mysql is actualy connected.

    In this example you see that everytime you try to “_connect” php doesn’t connect to the SQL server if a connection is already set.

  13. Mangesh permalink

    Nice Article…
    Thanks a lot!!

    I am a bit confused with a scenario. I got to select 42 columns out of 52 from a table. Columns include primary key columns as well. What will be better? selecting 42 columns or using select *? What is the limit till the time Select performs better than Select * ?

  14. Joost Pluijimers,

    The lazy loading was done correctly. The function _connect is a private function, therefore, may only be called by the class itself. The class only calls the _connect function in case you are doing a query. In addition to that, the query function will look if the connection has already been made.

Leave a Reply

Note: XHTML is allowed. Your email address will never be published.

Subscribe to this comment feed via RSS