Recent Articles

Recent Comments

Resources

Helping to Pay for Drizzle.org

October 27, 2008

Well, it has certainly been sometime since I had blogged last, however, that does not mean that I have not been keeping note of what is going on in the industry. If you have not heard about the Drizzle Project, then you must be living in a cave! ;)

The Drizzle project is building a database optimized for computing cloud and web applications. It is being designed for massive concurrency on modern multi-cpu/core architecture. The code is originally derived from MySQL.

Mike Shadle recently made a contribution by negotiating and purchasing Drizzle.org for $1K. If you haven’t contributed back to open source, here is a chance to show your token of appreciation. Help to support this project and help pay for Drizzle.org!

0 Comments

PHP Performance Series: Maximizing Your MySQL Database

June 18, 2008

In the first article of the PHP Performance Series, I focused on PHP Caching Techniques. This time I want to talk about maximizing your database. This article will deal mostly with MySQL, however, you should be able to note many of the different aspects even if you do not directly utilize MySQL.

Application SQL Performance

Application level SQL performance is much different than the performance of the SQL query itself but rather how it has been designed to work in the application. Many of the items I will be addressing in this area is designing your application to make less queries thus improving scalability and likely performance. However, performance does not always equal scalability as the same with scalability does not always equal performance.

If you have read my blog before you may notice that I have used some of this content before but putting in this section for terms of completeness.

Lazy Connections

Utilizing lazy connections for your database is a great step in applications that do not need to utilize the database through a full request or even if it needs to be utilized at all. The concept here is to not initialize the connection to the database unless absolutely essential to keep your connection pool free of massive amounts of sleeping connections.

Simple Lazy Connection Example

While not a full example, I believe this shows you a simple technique in handling lazy connections.

class My_Db {
 
	private $_connected = false;
	private $_connection;
 
	public function connect($host, $user, $pass, $db) {
		//method will simply set the connection variables
	}
	private function _connect() {
		if ($this->_connection = mysql_connect()) {
			$this->_connected = true;
		}
	}
	public function query($query) {
		if (!$this->_connected) {
			$this->_connect();
		}
		mysql_query($query, $this->_connection);
	}
}

Iterating Queries

This is one of the most common items I usually see when looking over another developers code or even several of the open source projects out there. I am defining iterating queries as a query that executes on a loop. These can be very expensive and often times are definitely not needed.

An Iterating Query Example

if (isset($_GET['ids'])) {
    foreach($_GET['ids'] as $id) {
        $rs = mysql_query('SELECT * FROM my_table WHERE my_id = ' . (int) $id);
        $row = mysql_fetch_assoc($rs);
        print_r($row);
    }
}

Fixing The Iterating Query Example

if (isset($_GET['ids'])) {
    $ids = array_map('intval', $_GET['ids']);
    $ids = implode(',', $ids);
    $rs = mysql_query('SELECT * FROM my_table WHERE my_id IN (' . $ids . ')');
    while($row = mysql_fetch_assoc($rs)) {
        print_r($row);
    }
}

Need Based Selects

There is no need to select information that you do not need. For starters this increases the memory usage on the request as well as the I/O time in fetching all of the record data from the columns that are not being utilized and transferring them through PHP. The more data you select the slower the query and larger the memory footprint. This is especially true with the TEXT and BLOG column types.

SELECT * IS BAD!

This actually is bad in 2 different areas. First and of a higher concern is what is actually being utilized from the query if you ever need to change something in the application? Say you have a large application with thousands of files that utilize a select query and you will have no idea where the variables are without actually researching each and every area of the application that has the wonderful SELECT *. Basically your maintainability slowly dies. Secondly as I stated before, there is also a performance and memory hit here, simply stated, do not use SELECT *.

One question that is commonly asked after this comment is what if I am using all of the columns? Still, are you going to be using all of the columns in 3 months, 6 months, 1 year, 5 years?

Use the Correct Data Type (Don’t Quote Everything)

Yes, utilizing the correct type of data in your query does matter. You can cause the database to miss indexes or come back with invalid results. Besides that aspect it is slower since the database has to convert the data into the correct type.

Example of Incorrect Data Types

if (isset($_GET['id'])) {
    $id = mysql_real_escape_string($_GET['id']);
    $rs = mysql_query("SELECT * FROM my_table WHERE my_id = '{$id}'");
}

Example of Correct Data Types

if (isset($_GET['id'])) {
    $id = (int) $_GET['id'];
    $rs = mysql_query('SELECT * FROM my_table WHERE my_id = ' . $id);
}

By the way, the 2nd one is also quicker because we only had to type cast for the numeric instead of running it through mysql_real_escape_string.

Hierarchical Data

When utilizing navigation trees, you should ensure that you are utilizing a proper technique and not pushing your database for it. Further, cache it instead of hitting it every request at the same time! There is already a great article over at MySQL on Managing Hierarchical Data so I am not going to go deep into this. But anytime you are managing a tree of data please spare other developers and utilize it correctly.

Database Design

Database design is typically the first area that you find problems start. A bad data model can plague your application with both performance and maintainability concerns. However, also note, the more performance driven that you make your database, the less maintainable that it can become (debatable - can really depend on size and scale).

Please note that this is not an all encompassing list as I wanted to give a little more of an idea from a developer point of view on the optimization techniques in database design rather than providing a full guide. If you are looking for a deep level of information I invite you to please go to the MySQL Manual on Database Design Feel free to add recommendations in the comments.

Normalization

Normalization is a technique utilized for minimizing the duplication of information. This is typically the best thing to start out with, if you are currently having problems and your database is not well normalized focus on that first. Likely you have tables, columns or data that shouldn’t need to work as it currently is.

Columns instead of Table Example

A common problem that many applications you find have is that they store the data into just columns instead of actually making a one-to-many table. Take for instance the following table:

app_user
Column Name Column Type
user_id integer
user_email varchar
user_website varchar
user_website_2 varchar
user_website_3 varchar

What you are likely seeing here is a user requested to have up to 3 websites and the developer working on it figured there would never need to be anymore. Now this is a focus for gaining further modularity as well as normalization.

Table instead of Columns Example

Using a table instead of columns would better support this feature as well as allow for further growth in the future.

app_user
Column Name Column Type
user_id integer
user_email varchar
app_user_website
Column Name Column Type
website_id integer
user_id integer
website_url varchar

You might be thinking, what does this have to do with performance? Well it has to deal with maintainability, better handling of your indexes, as well as lowering the table size in the initial table. Size of your tables does matter definitely when you need to start adding more indexes to columns that you might not need to. Take both examples and attempt to find users without a website entered without an index on the user_website columns. This becomes very slow instead of doing a simple select au.user_id from app_user au where au.user_id NOT IN (select distinct user_id from app_user_website) instead of having to add an additional index instead of simply using the foreign key that you would likely have defined in the app_user_website table for the user_id.

Denormalization

Denormalization is the concept of copying data into other tables in order to reduce the amount of joins that you need to create. These should typically always be handled with triggers and if you are unable to do triggers, please ensure that anything that creates, modifies or deletes handles through a layer of encapsulation. Otherwise you will end up with stale records.

You will only need to do this when you have exhausted all of the other routes such as checking and creating indexes, if data access was really needed and lastly there was no other method for the needs that you were going after.

Table Types

Use the correct table type for what you are doing or attempting to do. I suggest setting up a matrix of what you need your table to do as well as your database as a whole. This first step is often neglected by many developers.

Example List
Feature Option
Read vs. Write > 15% Yes/No
Transactions Yes/No
Foreign Key Support Yes/No
Full-Text Indexes Yes/No

The list above is by far no means a full list but you should document what you need and what you are using the database for. For example if you are reading and writing on the same database table, MyISAM is likely a bad idea if you have greater than 15% of reads or writes. MyISAM tables will easily lock when there is a long read and a subsequent insert with a table lock thus not allowing any further reads until that insert has completed. So make a list and figure out what needs to be there. This will certainly help you with furthering your database.

SQL Query Optimizations

Optimizing your SQL to perform is not rocket science, however, this seems to be one area where applications seem to start crumbling down from the point of an application hitting popularity. Simply the features that are developed and time change the database is rarely taken into effect in what might be affected.

A simple rule to follow is when utilizing your database to design your queries to your database architecture and current rules, when those cannot be achieved refactor and adjust the database to be able to handle the new situations.

The Simple Rules

  • Use your explain/execution plan
  • The less joins the better
  • Ensure you are utilizing your indexes (see first bullet)
  • Temporary tables can be good when doing operations on complex data sets
  • Stay away from derived tables and non-materialized views (see above bullet)
  • Roll up data that can be aggregated
  • Select the columns you need, not SELECT *

If you are looking for ways to better optimize your queries, again, please go to the MySQL Manual on Query Optimization

Exit(0);

You may have noticed that this blog post has taken me quite a while to push this out. Besides in the middle of purchasing a home, some resource constraints at work, a couple side projects and maintaining my life I just didn’t have much time to finish writing a more complete post.

To go a little further, I’ve cut down the contents in this blog as you may have been sick of reading already as well as the amount of information that could have been potentially written here could easily have been a full book if you wanted to get into each and every aspect. I figured for my sanity as well as yours I should cut it shorter. If you have any information to add please submit comments.

12 Comments

MySQL Workbench 1.0.5 Beta Released

March 7, 2006

Well finally, the guys over at MySQL released a beta of the workbench. The last few alpha versions have not even been usable. You can now boot it up and actually use it. It is one fine tool.

I was able to reverse engineer a model from a database and have all of the index’s and foreign keys show up. Also the grid is very nice on the eyes! Workbench supports views, procedures, triggers and all sorts of other MySQL 5.x features and above. If you are doing work on MySQL this is probably a tool you would like to have in the ole black hat.

Personally this is probably one of the best ER Diagramming tools I have seen. You might want to checkout the public beta on MySQL Downloads.

2 Comments

Preparing for MySQL 5 Certification

March 2, 2006

Well I am taking the plunge. Since I have the Zend Certification already I am going to start pursuing the MySQL 5 Developer certification. I find certification exams pretty hard seeing as you have to pull out all of your knowledge on a particular subject within a half of minute. Even more troubling is how this exam is 2 parts! I think it will go well, however I am not as well versed on the new MySQL features as of yet because we haven’t been utilizing it in production at my current company.

It is only a matter of time before we start working that into our CMS and completely rewrite it. So I am using this both as to get well versed in the new features of MySQL 5 and how they utilize them also learning the differences in how they implemented them in consideration of other databases. I think this will be a very interesting experience to really gain some more knowledge on MySQL and maybe if it proves successful and fun I will consider taking the DBA exams.

I will probably be writing a ton of software in my spare time because of this to really get a hold and memorize all of the different functionality of MySQL and tools. Maybe I will finally start to get some of my development ideas out of being an idea and into software that is working!

1 Comment

MySQL 5.x

January 15, 2006

Well well, as with all of my server updates tonight I decided to install MySQL 5.x as well.  Even although yet again the control panel is lame and didn’t have it.  It was a simple upgrade and everything went smooth.  Looks like I have a bunch of new toys to learn and to play with.  I think I might just build a community site using some of these new features :)

Although development takes a long time, there is nothing like using the latest technology to make everything run swiftly!  Hopefully I can get an update to get to the next step up from apache soon.  Maybe I just need to write a PHP based control panel that does some of the automation of setting up accounts for me.  While I do not have too many accounts I seem to like to buy domain names and stash them on the server.

Look for some new blogs concerning the new features I play with in MySQL :)

0 Comments