Like any breathing Rails developer, I love blogging about performance. I do it all the time. I’ve done it here, here, and quite famously, here.
But one thing I haven’t done is blog about Rails performance from a perspective of experience. But tripling in traffic for a few months in a row has a way of changing that.
So now I’m a real Rails performance guy. Ask me anything about Rails performance, and I’ll tell you to get back to me in a couple months, because this aint’ exactly yellowpages.com that I’m running here. BUT, these are the lessons and facts from our first few months of operation:
- One combined Rails server+Mysql slice at Slicehost is handling about 3000 daily visits and 30,000 daily pageviews (on a highly real time, interactive site) with relative ease. Almost all pageviews less than 2 seconds, most less than 1.
- Memcached saves our ass repeatedly
- Full text searching (we’re using Thinking Sphinx) saves our ass repeatedly
- BackgroundRb will ruin your life, cron-scheduled rake tasks will save it
- Database ain’t nothing but a chicken wing with indexing
Now there are five salient observations to take from a growing site, but you notice that it was the last one that I chose to single out in the title of this blog? Why? Because, if I called this entry “Rails Performance Blog,” your eyes would glaze over and you’d wouldn’t be able to read through the hazy glare.
Why else? Because the day I spent indexing our tables was the only time in the history of Bonanzle that I will ever bring forth a sitewide 2x-3x performance increase within about 4 hours time. God damn that was a fantastic day. I spent the second half of it writing airy musings to my girlfriend and anyone who would listen about how much fun web sites are to program. Then I drank beer and went rafting. Those who haven’t indexed their DB lately: don’t you hate me and want to be like me more than you ever have before?
Well, I can’t help you with the former, but the latter, that we can work on.
- Download Query Analyzer
- Delete your development.log file. Start your site in development mode. Go to your slowest page. Open your development.log file in an editor that can automatically update as the file changes.
- Look through the queries your Rails site is making. Any query where the “type” column reads “ALL” is a query on which you are searching every row of your database to satisfy the query. Hundreds of rows? OK, whatever. Thousands of rows? Ouch. Tens of thousands of rows (or more)? Your request might never be heard from again.
- Create indexes to make those “ALL”s go away. Adding an index in Rails is the simplest thing ever. In a migration: add_index :table_name, :column_name and you’re done. remove_index :table_name, :column_name and you’re undone.
- Observe that, at least for MySql, queries where you are filtering for more than one attribute in your where clause (e.g., select * from items where status = “active” and hidden = false) are still slow if you create indexes for “status” and “hidden.” Why? I think it’s because the DB ORs them together to find its results. But I don’t know the exact reason, nor do I care. What I do know is that an add_index :items, [:status, :hidden] creates a compound attribute that will get you back to log(n) time in making queries with compound where clauses.
Now, if you are like me or the 50 people in the Rails wiki and forums that learn about this crazy wonderful thing called “indexes,” your first question is “Indexing sounds pretty bangin. Why not just index the hell out of everything?”
Answer: Them indexes aren’t immaculately conceived, son. Every index you create has to be generated and maintained. So the more indexes you create, the more overhead there is to inserting or deleting records from your table. Of course, most queries on most sites are read queries, so you will make up the extra insert/delete time by 10x or more, but if you were to go buck wild and index the farm, you probably wouldn’t be much better off on balance than if you indexed nothing at all. You see why downloading Query Analyzer was the first step?
The general rule that is given on indexes is that most any foreign key should be indexed, and any criteria upon which you regularly search or sort should be indexed. That’s worked well for us. For tables with less than 500 rows, I usually get lazy and don’t do any indexing, and that seems fine. But assuredly, if you’re working on a table with 1,000 or more rows and you’re querying for columns that aren’t indexed, you are 15 minutes away from a beer-enabling, management-impressing performance optimization that would make Ferris Bueller proud.
you are one hard-working intelligent dude, how many job offers do you get per day?
Thanks for the post. It is always nice to read tips from real experience. I think it was worthy to read it 🙂