For the last few days I have been trying to track down why Unity’s PHPBB 2 Forum (http://forum.unity3d.com) was so slow. Page rendering times were taking between 2 and 10 seconds and for me this was just unacceptable.
A proper website should be able to render a majority of the page in less than 250ms, and delivering most of the content in less than 50ms is ideal. Sites like Amazon, Yahoo, and Google have studied the effects of response time vs. features, and have found that response time is often times more important. Greg Linden pulls together a few sources on the topic in his blog.
So, 10 seconds was just unacceptable, and I was bound to find a solution. I started with the low hanging fruit, and installed PHP’s APC cache which is an OpCode Cache that stores compiled versions of the PHP code to reuse on the next request. There is evidence out there to suggest that not only do opcode caches reduce CPU requirements, but they also reduce your memory load. I saw a few sites that claim about a 3-4x performance increase on the CPU and about a 25% reduction in memory usage.
The next step, as the forums were still slow, was to start looking into our MySQL usage. Did the server have enough memory, were the MySQL caches and buffers large enough, or were searches and queries getting pruned to make room for more queries? After using MySQL’s GUI tools through an SSH Tunnel I was able to see that the server had 64MB of query cache, and only 90% of it was being used, so the caches were good.
Next was to look for slow queries, however a slow query or two would be hard to fix considering PHPBB was building the queries as part of the application and should have been tested in advance. If a slow query was the problem, it probably was not part of the application design, but rather an indication of something else that was going wrong. We did notice a few slow queries, but nothing ridiculous. However, what we did notice was that a number of queries were running slow because other ones where holding a lock for a long period. Hmmm?
Next we started looking at the problem from a system level using apps like ‘top’, ‘dstat’, ‘mytop’, etc… From here we found that disk IO was over working. Hmmm… but why?
I later found out that we were serving large files in our forum, however it did not seem like our forum was being used as a hot linking service, and I would suspect that PHPBB has some defense from that, but going forwards this is a concern of mine.
I finally found a post on the web talking about MySQL CPU spikes, and it was related to a corrupted table index. http://forums.fedoraforum.org/showthread.php?t=232008
After reading the forum for a bit, I thought, well there seems to be a few low risk commands I can run to look into our table integrity. Here is the full command list http://dev.mysql.com/doc/refman/5.0/en/mysqlcheck.html
First I ran:
mysqlcheck -A -F -u [username] –p
This reported several table closed improperly errors, and took about 2-6 mins to run without taking down the forums.
Next I ran:
mysqlcheck -A -q -r -u [username] –p
This is basically the fastest repair option, and would only fix minor corruption. It took about 5-10 min to run.
The result was that our MySQL usage dropped from 100-300% to 3-6% and page response times being about 200-500ms. I have a few other things I would like to try, from using our CDN with the forums to DB optimizations, to other server upgrades.
Over the weekend, I think I am going to run:
mysqlcheck --all-databases --optimize -u [username] –p
Which should be more thorough, but might take an hour. This will defrag the table and resolve some indexing errors.
Recent Comments