My Desktop Welcomes Kubuntu
June 22, 2008
Well, finally was able to get a project I have been meaning to do for a while out of the way this weekend. However, it was fairly complex to actually get Kubuntu or any linux distribution up and running on my desktop. Actually, one mouse died during the experience (I must note that it was already misbehaving and in need of a replacement).
The hardest thing about getting this install done, as most technologists, is that I do not have an average computer setup. Currently it has 3 SATA drives including 2 WD Raptures and another drive for simply storage. Since the rapture drives operate at 10,000 RPM and have 36GB each they are great fits for what needs to be fast vs the slower 7,200 120 GB which is great for all of my storage. On top of that running my older AMD Athlon 3200+ with the 2GB ECC registered memory and a nice PCI-E XFX GeForce 8400 and not to mention the 2 cd/dvd burners it was a pain to get it all up and running correctly. On top of that nForce 4 boards can be a pain to get running right.
After about 15 hours of investing and researching I was finally able to find all of the flags to pass to the kernel. Before I found all of these flags I was plagued with freezes and could hardly get anything. It is very nice to have this complete and I am loving it. Now if I could get my dual monitors to work without crashing that is the next step.
Next projects on the list:
- Finish setting up my Linux environment for development.
- Start writing the next blog post for performance (hopefully a little more thorough).
- Start to get a preview version of a current project I want to release at some point (should be helpful to organizations that deal with development).
- Start writing a book (maybe, eventually).
Who knows, there are so many things to do and just simply not enough time. In case you are interested in the flags that I had to use for my install of Kubuntu and you are looking to do the same thing:
noapic nolapic acpi=off irqpoll pci=noacpi pnpbios=off
PHP Performance Series: Maximizing Your MySQL Database
June 18, 2008
In the first article of the PHP Performance Series, I focused on PHP Caching Techniques. This time I want to talk about maximizing your database. This article will deal mostly with MySQL, however, you should be able to note many of the different aspects even if you do not directly utilize MySQL.
Application SQL Performance
Application level SQL performance is much different than the performance of the SQL query itself but rather how it has been designed to work in the application. Many of the items I will be addressing in this area is designing your application to make less queries thus improving scalability and likely performance. However, performance does not always equal scalability as the same with scalability does not always equal performance.
If you have read my blog before you may notice that I have used some of this content before but putting in this section for terms of completeness.
Lazy Connections
Utilizing lazy connections for your database is a great step in applications that do not need to utilize the database through a full request or even if it needs to be utilized at all. The concept here is to not initialize the connection to the database unless absolutely essential to keep your connection pool free of massive amounts of sleeping connections.
Simple Lazy Connection Example
While not a full example, I believe this shows you a simple technique in handling lazy connections.
class My_Db { private $_connected = false; private $_connection; public function connect($host, $user, $pass, $db) { //method will simply set the connection variables } private function _connect() { if ($this->_connection = mysql_connect()) { $this->_connected = true; } } public function query($query) { if (!$this->_connected) { $this->_connect(); } mysql_query($query, $this->_connection); } }
Iterating Queries
This is one of the most common items I usually see when looking over another developers code or even several of the open source projects out there. I am defining iterating queries as a query that executes on a loop. These can be very expensive and often times are definitely not needed.
An Iterating Query Example
if (isset($_GET['ids'])) { foreach($_GET['ids'] as $id) { $rs = mysql_query('SELECT * FROM my_table WHERE my_id = ' . (int) $id); $row = mysql_fetch_assoc($rs); print_r($row); } }
Fixing The Iterating Query Example
if (isset($_GET['ids'])) { $ids = array_map('intval', $_GET['ids']); $ids = implode(',', $ids); $rs = mysql_query('SELECT * FROM my_table WHERE my_id IN (' . $ids . ')'); while($row = mysql_fetch_assoc($rs)) { print_r($row); } }
Need Based Selects
There is no need to select information that you do not need. For starters this increases the memory usage on the request as well as the I/O time in fetching all of the record data from the columns that are not being utilized and transferring them through PHP. The more data you select the slower the query and larger the memory footprint. This is especially true with the TEXT and BLOG column types.
SELECT * IS BAD!
This actually is bad in 2 different areas. First and of a higher concern is what is actually being utilized from the query if you ever need to change something in the application? Say you have a large application with thousands of files that utilize a select query and you will have no idea where the variables are without actually researching each and every area of the application that has the wonderful SELECT *. Basically your maintainability slowly dies. Secondly as I stated before, there is also a performance and memory hit here, simply stated, do not use SELECT *.
One question that is commonly asked after this comment is what if I am using all of the columns? Still, are you going to be using all of the columns in 3 months, 6 months, 1 year, 5 years?
Use the Correct Data Type (Don’t Quote Everything)
Yes, utilizing the correct type of data in your query does matter. You can cause the database to miss indexes or come back with invalid results. Besides that aspect it is slower since the database has to convert the data into the correct type.
Example of Incorrect Data Types
if (isset($_GET['id'])) { $id = mysql_real_escape_string($_GET['id']); $rs = mysql_query("SELECT * FROM my_table WHERE my_id = '{$id}'"); }
Example of Correct Data Types
if (isset($_GET['id'])) { $id = (int) $_GET['id']; $rs = mysql_query('SELECT * FROM my_table WHERE my_id = ' . $id); }
By the way, the 2nd one is also quicker because we only had to type cast for the numeric instead of running it through mysql_real_escape_string.
Hierarchical Data
When utilizing navigation trees, you should ensure that you are utilizing a proper technique and not pushing your database for it. Further, cache it instead of hitting it every request at the same time! There is already a great article over at MySQL on Managing Hierarchical Data so I am not going to go deep into this. But anytime you are managing a tree of data please spare other developers and utilize it correctly.
Database Design
Database design is typically the first area that you find problems start. A bad data model can plague your application with both performance and maintainability concerns. However, also note, the more performance driven that you make your database, the less maintainable that it can become (debatable - can really depend on size and scale).
Please note that this is not an all encompassing list as I wanted to give a little more of an idea from a developer point of view on the optimization techniques in database design rather than providing a full guide. If you are looking for a deep level of information I invite you to please go to the MySQL Manual on Database Design Feel free to add recommendations in the comments.
Normalization
Normalization is a technique utilized for minimizing the duplication of information. This is typically the best thing to start out with, if you are currently having problems and your database is not well normalized focus on that first. Likely you have tables, columns or data that shouldn’t need to work as it currently is.
Columns instead of Table Example
A common problem that many applications you find have is that they store the data into just columns instead of actually making a one-to-many table. Take for instance the following table:
| Column Name | Column Type |
|---|---|
| user_id | integer |
| user_email | varchar |
| user_website | varchar |
| user_website_2 | varchar |
| user_website_3 | varchar |
What you are likely seeing here is a user requested to have up to 3 websites and the developer working on it figured there would never need to be anymore. Now this is a focus for gaining further modularity as well as normalization.
Table instead of Columns Example
Using a table instead of columns would better support this feature as well as allow for further growth in the future.
| Column Name | Column Type |
|---|---|
| user_id | integer |
| user_email | varchar |
| Column Name | Column Type |
|---|---|
| website_id | integer |
| user_id | integer |
| website_url | varchar |
You might be thinking, what does this have to do with performance? Well it has to deal with maintainability, better handling of your indexes, as well as lowering the table size in the initial table. Size of your tables does matter definitely when you need to start adding more indexes to columns that you might not need to. Take both examples and attempt to find users without a website entered without an index on the user_website columns. This becomes very slow instead of doing a simple select au.user_id from app_user au where au.user_id NOT IN (select distinct user_id from app_user_website) instead of having to add an additional index instead of simply using the foreign key that you would likely have defined in the app_user_website table for the user_id.
Denormalization
Denormalization is the concept of copying data into other tables in order to reduce the amount of joins that you need to create. These should typically always be handled with triggers and if you are unable to do triggers, please ensure that anything that creates, modifies or deletes handles through a layer of encapsulation. Otherwise you will end up with stale records.
You will only need to do this when you have exhausted all of the other routes such as checking and creating indexes, if data access was really needed and lastly there was no other method for the needs that you were going after.
Table Types
Use the correct table type for what you are doing or attempting to do. I suggest setting up a matrix of what you need your table to do as well as your database as a whole. This first step is often neglected by many developers.
| Feature | Option |
|---|---|
| Read vs. Write > 15% | Yes/No |
| Transactions | Yes/No |
| Foreign Key Support | Yes/No |
| Full-Text Indexes | Yes/No |
The list above is by far no means a full list but you should document what you need and what you are using the database for. For example if you are reading and writing on the same database table, MyISAM is likely a bad idea if you have greater than 15% of reads or writes. MyISAM tables will easily lock when there is a long read and a subsequent insert with a table lock thus not allowing any further reads until that insert has completed. So make a list and figure out what needs to be there. This will certainly help you with furthering your database.
SQL Query Optimizations
Optimizing your SQL to perform is not rocket science, however, this seems to be one area where applications seem to start crumbling down from the point of an application hitting popularity. Simply the features that are developed and time change the database is rarely taken into effect in what might be affected.
A simple rule to follow is when utilizing your database to design your queries to your database architecture and current rules, when those cannot be achieved refactor and adjust the database to be able to handle the new situations.
The Simple Rules
- Use your explain/execution plan
- The less joins the better
- Ensure you are utilizing your indexes (see first bullet)
- Temporary tables can be good when doing operations on complex data sets
- Stay away from derived tables and non-materialized views (see above bullet)
- Roll up data that can be aggregated
- Select the columns you need, not SELECT *
If you are looking for ways to better optimize your queries, again, please go to the MySQL Manual on Query Optimization
Exit(0);
You may have noticed that this blog post has taken me quite a while to push this out. Besides in the middle of purchasing a home, some resource constraints at work, a couple side projects and maintaining my life I just didn’t have much time to finish writing a more complete post.
To go a little further, I’ve cut down the contents in this blog as you may have been sick of reading already as well as the amount of information that could have been potentially written here could easily have been a full book if you wanted to get into each and every aspect. I figured for my sanity as well as yours I should cut it shorter. If you have any information to add please submit comments.



