Rails: Beware the custom truncate

Ran into an interesting bug a few days ago that I’ve been meaning to document for anyone who has written their own truncating function in Rails. If you have, I’d guess it probably looked something like this:

def truncate(str, length)
	return '' if str.blank?
	truncated = str.size > length
	(str[0..(truncated ? length - 3 : length)] + (truncated ? "..." : ''))
end

Fine enough. Unless you happen to be truncating user-given strings.

If you are letting your users enter in the information that gets truncated, chances are that some of them are entering unicode characters for accents, quotation marks, etc. Because unicode characters are 2-4 bytes long, the above truncate will split characters in half and cause general headaches if it truncates text that has unicode characters in it.

Split-in-half characters are bad news. They will cause errors like “illegal JSON output,” which is how I originally spotted this as a problem with our truncate method.

The solution is to take a page from Rails’ own truncate, and use #mb_chars. So a hand-written truncate that works for unicode would be:

def truncate(str, length)
	return '' if str.blank?
	truncated = str.size > length
	(str.mb_chars[0..(truncated ? length - 3 : length)] + (truncated ? "..." : '')).to_s
end

You’re welcome.

Join Multiple Copies of Same Model in Thinking Sphinx Index

This was a vexing problem that probably affects 0.1% of all Thinking Sphinx users, but for those select few, you can benefit from my pain.

We have a model with the following associations:

has_many :merch_match_data_tags, :class_name => "MerchMatchItemData", :dependent => :delete_all
has_one :mm_bizarre, :class_name => "MerchMatchItemData", :conditions => { :data_tag_id => MerchMatchItemData::BIZARRE }
has_one :mm_good_picture, :class_name => "MerchMatchItemData", :conditions => { :data_tag_id => MerchMatchItemData::NICE_PICTURE }
has_one :mm_funny, :class_name => "MerchMatchItemData", :conditions => { :data_tag_id => MerchMatchItemData::FUNNY }

Intending to add these to a Sphinx index, we used the following code:

has mm_good_picture.tag_count, :as => :good_picture_points
has mm_bizarre.tag_count, :as => :bizarre_points
has mm_funny.tag_count, :as => :funny_points

What perplexed me after trying this was that while the “good_picture_points” could be queried and sorted, bizarre_points and funny_points returned no Sphinx results. Looking into the output generated by thinking_sphinx:configure, I discovered why:

...
LEFT OUTER JOIN `merch_match_item_datas` ON merch_match_item_datas.item_id = items.id AND `merch_match_item_datas`.`data_tag_id` = 0   LEFT OUTER JOIN `merch_match_item_datas` mm_bizarres_items ON mm_bizarres_items.item_id = items.id AND `merch_match_item_datas`.`data_tag_id` = 2   LEFT OUTER JOIN `merch_match_item_datas` mm_good_values_items ON mm_good_values_items.item_id = items.id AND `merch_match_item_datas`.`data_tag_id` = 7   LEFT OUTER JOIN `merch_match_item_datas` mm_funnies_items ON mm_funnies_items.item_id = items.id AND `merch_match_item_datas`.`data_tag_id` = 4  
...

The problem was that, in determining the SQL to build, Thinking Sphinx uses the first association it comes across as the default set of conditions for all future joins to the table. So, in this case, anything that joined the merch_match_item_datas table was going to be joining that table with the data_tag_id = 0 condition of our first declared association (mm_good_picture_tag). That is, mm_bizarre now was looking for data_tag_id=0 and data_tag_id=[id of bizarre tag]. So, nothing was returned.

After a bit of head scratching, I came up with the following workaround for this:

has merch_match_data_tags.tag_count
has mm_good_picture.tag_count, :as => :good_picture_points
has mm_bizarre.tag_count, :as => :bizarre_points
has mm_good_value.tag_count, :as => :good_value_points

Basically, just make the first association that Thinking Sphinx encounter be an unqualified, unfiltered association to the merch_match_data_table. This ensures that the proper join structure is setup, so all of the subsequent has attributes function as they should.

Hope that I’m not the only one ever to find this useful.

Hint for Job Seekers: Wake Up and Write!

Over the last three years I’ve spent at least 6 months hiring, which equates to more than 1,000 applicants reviewed. But even before I had seen our 50th applicant, I was stunned by the applicant apathy that pervaded our job inbox. At first I figured it must be us. When we were initially hiring, it was for the opportunity to work for peanuts at an unproven web startup. Surely this must explain why 95% of the applications we received were a resume accompanied by a generic cover letter, or no cover letter at all.wakeup_job

But now that we have proven our business, with ample resources to bring aboard top tier talent, I am baffled at the scarcity of job seekers who understand the opportunity that the cover letter presents for them to stand out from the other 49 applications I’ll receive today.

Think about it, job seeker. Every day, my inbox is flooded with anywhere from 25-50 applicants. Each of these applicants sends a resume, and each of these resumes detail experience at a bunch of companies I haven’t heard of in job titles that can only hint at what the person might have really done on a day-to-day basis.

If you were me under these circumstances, how would you weed out the applicants that are most interesting? What would wake up you from the torrent of generic cover letters and byzantine job histories?

P-E-R-S-O-N-A-L-I-T-Y.

When I am not paying close attention, it feels like the same guy has been applying for our job repeatedly for months, each time with a slightly different form letter to accompany his or her list of jobs titles.

The applicants that wake me up from this march of sameness are those 5% that demonstrate they have actually taken the 5 minutes to understand what Bonanzle is, what about the company gets them excited, and why they would be a good fit relative to our objectives and specific job description. (IMPORTANT NOTE: Batch-replacing [company name] with “Bonanzle” does not qualify as personalizing)

Interestingly, the applicants for business-related positions we’ve posted in the past tend to do a comparatively phenomenal job at this. If only these business people had design, UI, or programming skills, they would immediately ascend to the top of our “To interview” list. But the actual creators — programmers, designers, and UI experts — just don’t seem to get it. I suppose it could be a chicken-and-egg situation, where the minority of them that do get it are swooped up immediately by companies that crave that glimpse of personality, and the rest of them keep blindly applying to every job on Craigslist without giving a damn.

The other sorely underrepresented aspect to a good application? Decent portfolios. If you’re a designer, take the slight interest I’ve already expressed toward resumes, and cut it in half. Your value is much easier to ascertain by what you’ve done than what you’ve said, and you have the perfect opportunity to show us what you’ve done by creating a modern, user friendly portfolio. On average, I’d estimate I see about one modern, well constructed portfolio of these for every 20 designers that apply. (Personal bias: Flash-based portfolio sites load slow and feel staid; I might be unique in that opinion though)

I see a huge opportunity to awaken and realize how little effort it would take to create an application that shines. You want to be a real overachiever? Why not spend 15 minutes to sign up for an account and browse the site, and incorporate that experience into your cover letter? Amongst more than 50 applicants for our first hire, Mark Dorsey, aka BonanzleMark aka the best hire I’ve made so far, was the SINGLE applicant that spent the 15 minutes required to do this. In more than 500 applications since, I have yet to see it again.

The world is rife with creative ways to get your application noticed. All it takes is 15-30 minutes of your time (including time to personalize the letter) to rise into the 90th percentile. If it’s a job you care about, you’re earning a potentially $100k salary for 30 minutes of work = about $3-4k per minute. I know lawyers that don’t even make that much.

Rails tests: One line to seriously pump up the speed

Alright, I admit it: I didn’t seriously write tests for Bonanzle until a couple months ago. But I had my reasons, and I think they were good ones.

Reason #1 was that I hated everything about fixtures. I hated creating them, I hated updating them every time we migrated, and I hated remembering which fixture corresponded to which record. Factory Girl was the panacea for this woe.

Reason #2 was that it took an eon to run even a single test. When trying to iterate tests and fixes, this meant that I ended up spending my time 10 parts waiting to one part coding. After much digging, I eventually determined that 90% of our test load time was attributable to caching all our classes in advance. Of course, my first inclination was just not not cache classes in our test environment, which actually worked reasonably well to speed tests the hell up, until I started writing integration tests, and found our models getting undefined and unusable over the course of multiple requests. Then, I found the answer:

config.eager_load_paths.clear

This line basically says, even if you set config.cache_classes = true, Rails should not try to pre-load all models (which, in our case is more than 100).

Adding this line allows us to cache classes in test (which fixes the integration test problems), while at the same time getting the benefits of a configuration that doesn’t take 2 minutes to load.

(Of course, also key was configuring our test rakefile such that we could run single tests, rather than being obligated to run the entire suite of tests at once. If anyone needs finds this post and doesn’t yet know how to invoke a single test, post a comment and I’ll get unlazy and post the code for that)

Get Session in Rails Integration Test

From the results Google gives on this, it seems that about three people in the world are using integration tests in Rails, and two of them stopped programming in 2007.

My goal: to get at session data from within an integration test.

Bad news: I don’t know any way to do this without first calling a controller action from within your integration test.

Good news: I have example code on how to get at it after making a request.

def add_to_cart_integration_test
  s = open_session
  s.post url_for(:controller => 'shopping_cart', :action => :add_item, :item_id => 1)
  s.session.data # a populated session hash
  s.flash # your flash data
end

And there you have it. Here is a more detailed examples that may be of use, derived from working integration tests in our codebase:

open_session do |s|
	item_1 = FactoryHelper::ItemFactory.create_sellable_item
	add_item_to_cart(item_1, s)
	user = Factory(:activated_user, {:password => USER_PASSWORD})
	login_user(user, USER_PASSWORD, s)

	s.post url_for(:controller => 'offers', :action => :cart_summary)
	all_offers = s.assigns(:all_offers_by_booth)
	assert_equal all_offers.size, 1
	the_offer = all_offers.first

	# Trial 1:  Simulate dragging item out of cart
	s.post url_for(:controller => 'lootbins', :action => :remove_from_lootbin_draggable, :item => item_1.id)

	s.post url_for(:controller => 'offers', :action => :cart_summary)
	all_offers = s.assigns(:all_offers_by_booth)
	assert all_offers.empty?
	assert !Offer.exists?(the_offer)

	# Trial 2:  Simulate removing item from cart on cart page
	add_item_to_cart(item_1, s)
	s.post url_for(:controller => 'offers', :action => :cart_summary)
	all_offers = s.assigns(:all_offers_by_booth)
	assert_equal all_offers.size, 1
	the_offer = all_offers.first

	s.put url_for(:controller => 'offers', :action => :remove_from_offer, :item_id => item_1.id, :id => the_offer.id)
	s.post url_for(:controller => 'offers', :action => :cart_summary)
	all_offers = s.assigns(:all_offers_by_booth)
	assert all_offers.empty?
	assert !Offer.exists?(the_offer)
end

And here are the helpers involved:

def add_item_to_cart(item, os)
	items_in_bin = begin
		lootbin = Lootbin.new(os.session.data) # equivalent to ApplicationController#my_lootbin
		lootbin.offer_items.size
	rescue
		0
	end
	os.post url_for(:controller => 'lootbins', :action => :add_to_lootbin_draggable, :item => item[:id])
	lootbin = Lootbin.new(os.session.data) # equivalent to ApplicationController#my_lootbin
	assert lootbin
	assert lootbin.has_items?
	assert_equal lootbin.offer_items.size, items_in_bin+1
end

def login_user(user, password, integration_session)
	integration_session.https!
	integration_session.post url_for(:controller => 'sessions', :action => :create, :username => user.user_name, :password => password)
		
	assert_equal integration_session.session.data[:user_id], user.id
end

Paypal Masspay Ruby Example

The title of this post is a Google query that yielded no good results, but plenty of puzzled users trying to figure out how to make it work. I’ve only been playing with it for about half an hour, but this code is getting Paypal to tell me that the request is successful:

def self.send_money(to_email, how_much_in_cents, options = {})
	credentials = {
	  "USER" => API_USERNAME,
	  "PWD" => API_PASSWORD,
	  "SIGNATURE" => API_SIGNATURE,
	}

	params = {
	  "METHOD" => "MassPay",
	  "CURRENCYCODE" => "USD",
	  "RECEIVERTYPE" => "EmailAddress",
	  "L_EMAIL0" => to_email,
	  "L_AMT0" => ((how_much_in_cents.to_i)/100.to_f).to_s,
	  "VERSION" => "51.0"
	}

	endpoint = RAILS_ENV == 'production' ? "https://api-3t.paypal.com" : "https://api-3t.sandbox.paypal.com"
	url = URI.parse(endpoint)
	http = Net::HTTP.new(url.host, url.port)
	http.use_ssl = true
	all_params = credentials.merge(params)
	stringified_params = all_params.collect { |tuple| "#{tuple.first}=#{CGI.escape(tuple.last)}" }.join("&")

	response = http.post("/nvp", stringified_params)
end

Certainly not the tersest solution, but I’ve kept it a bit verbose to make it clearer what’s happening.

One point of note is that you’ll need separate credentials when submitting to sandbox vs. production. You can sign up for a sandbox account by clicking “Sign up” from this page. After you have your account, login, click on “profile,” then get your API access credentials.

Here is the PHP example I based my code on and here is the Paypal Masspay NVP documentation that was marginally helpful in figuring out what params to pass.

Best of Rails GUI, Performance, and other Utilities

I’m all about putting order to “best of” and “worst of” lists, so why not give some brief props to the tools, plugins, and utilities that make life on Rails a wonderous thing to behold?

5. Phusion Passenger. OK, this would probably be first on the list, but it’s already been around so long that I think it’s officially time to start taking it for granted. But before we completely take it for granted, would anyone care to take a moment to remember what life was like in a world of round-robin balanced Mongrels web servers? You wouldn’t? Yah, me neither. But no matter how I try, I cannot expunge memories of repeatedly waking up to the site alarm at 7am to discover somebody had jammed up all the Mongrels with their stinking store update and now I’ve got to figure out some way to get them to stop.

4. jQuery / jRails. This probably deserves to score higher, as the difference between jQuery and Prototype is comparable to the difference between Rails and PHP. But since it’s not really Rails-specific, I’m going to slot it at four and give major props to John Resig for being such an attentive and meticulous creator. Without jQuery and jQuery UI, the entire web would be at least 1-2 years behind where it is in terms of interactivity, and I don’t think that’s hyperbole. (It even takes the non-stop frown off my face when I’m writing Javascript. With jQuery, it’s merely an intermittent frown mixed with ambivalence!)

3. Sphinx / Thinking Sphinx. There’s a reason that, within about six months time, Thinking Sphinx usurped the crown of “most used full text search” utility from “UltraSphinx.” And the reason is that it takes something (full text search) that is extremely complicated, and it makes it stupidly easy. And not just easy, but extremely flexible. Bonanzle has bent Sphinx (0.9.8) into doing acrobatics that I would have never guessed would be possible, like updating it in nearly-real time as users log in and log out. Not to mention the fact it can search full text data from 4 million records in scant milliseconds.

Sphinx itself is a great tool, too, though if I were going to be greedy I would wish that 0.9.9 didn’t reduce performance over 0.9.8 by around 50% in our testing, and I would wish that it got updated more often than once or twice per year. But in the absence of credible competition, it’s a great search solution, and rock solidly stable.

2. New Relic. OK, I’ll admit that I’ve had my ups and downs with New Relic, and with the amount of time I’ve spent complaining to their team about the UI changes from v1 to v2, they probably have no idea that it still ranks second in my list, ahead of Sphinx, as most terrific Rails tools. But it does, because, like all the members of this list, 1) the “next best” choice is so far back that it might as well not exist (parsing logs with pl-analyze? Crude and barely useful. Scout? Nice creator, but the product is still a tot. Fiveruns? Oh wait, Fiveruns doesn’t exist anymore. Thank goodness) and 2) it is perhaps the most essential tool for running a production-quality Rails site. Every time I visit an ASP site and get the infinite spinner of doom when I submit a form, I think to myself, “they must not know that every time I submit a form it takes 60 seconds to return. That would suck.” On a daily basis, I probably only use 10% of the functionality in New Relic, but without that 10%, the time I’d spend tracking logs and calculating metrics would make my life unfathomably less fun.

1. Rubymine. The team that created this product is insane. Every time that I hit CTRL-O and I type in “or/n” and it pops up all files in the “offers_resolution” folder starting with the letter “n,” I know they are insane, because having that much attention to productivity is beyond what sane developers do. Again, for context’s sake, one has to consider the “next best” choice, which, for Linux (or Windows) is arguably a plain text editor (unless you don’t mind waiting for Eclipse to load and crash a few times per day). But, instead of programming like cavemen, we have a tool that provides killer function/file lookup; impeccable code highlighting and error detection (I had missed that, working in a non-compiled language); a working visual debugger; and, oh yeah, a better Git GUI than any of five standalone tools that were built specifically to be Git GUIs.

Perhaps as importantly as what Rubymine does is what it doesn’t do. It barely ever slows down, it doesn’t make me manage/update my project (automatically detecting new files and automatically adding new files to Git when I create them from within Rubymine), and it handles tabs/spaces like a nimble sorcerer (something that proved to be exceedingly rare in my quest for a usable IDE).

Like New Relic, I probably end up using only a fraction of the features it has available, but I simply can’t think of anything short of writing my code for me that Rubymine could do that it doesn’t already handle like a champ. Two thumbs up with a mutated third thumb up if I had one.

Conclusion

Yes, it is a list of apples and oranges, but the commonality is that all five-ish of the lists members stand apart from the “second best” solution in their domain by a factor of more than 2x. All of them make me feel powerful when I use them. And all of them, except arguably New Relic, are free or bargain priced. Hooray for life after Microsoft. Oh how doomed are we all.

Rails Unified Application Logging with Log Weaver

The Problem

After adding our third app server a couple days ago, the appeal of digging through three separate production.log files when things go awry on Bonanzle was officially over.

Like many Rails developers in this situation, I Googled numerous terms in search of a solution, and most of these terms sent me to my good friend Jesse Proudman’s blog on using Syslog-ng to unify Rails logfiles. So we installed it (well, to be specific, we had Blue Box set it up for us, because it looked complicated), and determined that it was not what we were looking for. Installation issues aside (of which there were a few), the real killer when using Syslogger with Rails is that you lose the buffered production log output you have come to know and love, leaving your production logfiles a stew of mishmashed lines from numerous Passenger process in numerous states of processing. In short, if you get an appreciable amount of traffic (and I’d imagine you do if you’re reading this in the first place), and you are a human being, you will not be able to read an unbuffered Rails log without considerable time and frustration.

The Solution

Exists on Github here.

Since it looked like Syslog was the only game in town currently for merging application server logs, I decided to spend the afternoon writing a plugin that would allow us to take an arbitrary number of production logfiles, from an arbitrary number of hosts, and merge them together into one file without changing the formatting of the production logs or affecting performance on the app servers.

The basic mechanism of this plugin is that it uses rsync to grab your production logs, then it boils those production logs down into hashes of { :time => action_time, :text => action text }. It then outputs the actions from all of your app servers into a single file in chronological order.

As a bonus, it also lets you specify the maximum size of your unified log file, and handles keeping the logfiles broken into bite-sized chunks, so that you can actually read the output afterwards (rather than ending up with a 5GB log file). This functionality is built in, and can be configured via an included YML file.

The remainder of this post will just quote from the Github project, which does a pretty fine job of explaining what’s going on:

Log Weaver Functionality
========================
Log Weaver v0.2 sports the following featureset:

* Sync as many production logfiles, from as many hosts as you want, into a single logfile on a single server
* Use your existing Rails production.log, no need to modify how it’s formatted
* Break up your unified log file into bite sized chunks that you specify from a YML file, so you don’t end up with a 10 GB unified logfile that you can’t open or use
* Does not run on or adversely affect performance of app servers. Uses rsync to grab production log files from app servers, then does the “hard work” of combining them on what is presumably a separate server.

Installation
============
Clone the log-weaver github project into your vendor/plugins directory. No models, database tables, or installation is needed for this plugin. Simply edit the /log-weaver/config/system_logs.yml file to specify the settings of your hosts.

Usage
=====
Run “rake log_weaver:weave_logs” to initiate the process of log merging.

When you run this task, Log Weaver will rsync your logfiles from the locations you specified in the YML file. The first time you run Log Weaver, this might take a minute or two, depending on how big your production logs are. On subsequent runs, it should be pretty instanteous, since rsync is smart about merging files.

After rsyncing your production logs from your app servers, Log Weaver will build the production logs into bite-sized hashes and merge them in chronological order into the file that you specified in your YML file. Limited testing has shown that combining three logfiles that were each 1+ GB used less than 300MB of memory and completed in about 10 minutes.

Run “rake log_weaver:weave_logs” periodically to add to your unified log. When the size of your unified log exceeds the size you specified in the settings YML file, the unified log will be renamed “[log_name]_X.log” where X is the lowest integer of a file that doesn’t exist in your log directory. That is, if you named your file “unified.log,” Log Weaver would move the original log to “unified_2.log” and then open a new “unified.log” file to continue merging your logs.

Future Improvements
===================
Log Weaver was written over the course of a few hours to fit the baseline needs of Bonanzle, there is surely plenty of room to improve! For starters, this would probably make more sense as a gem than a Rails plugin.

Feel free to fork and add whatever you think it needs and ping me to pull in your improvements and we can make this plugin a worthwhile thing.

Savage Beast 2.3, a Rails Message Forum Plugin

Savage Beast 2.0 has been the de facto solution for those looking to add a message forum to their existing Rails site, but it was created more than a year ago, and had many aspects that tied it to Rails 2.0. Also, it relied on the Engines plugin, which is not the most lightweight plugin. Although Engines doesn’t seem to affect performance, it did rub some people the wrong way.

After a year’s worth of promises that an update was “coming soon,” an update has finally arrived and is now available at Github.

Detailed instructions on getting it rolling with Rails 2.3 follow.

Installation

Currently, the following is necessary to use the Savage Beast plugin:

  1. The Savage Beast 2.3 plugin. Go to your application’s root directory and:
    script/plugin install git://github.com/wbharding/savage-beast.git
  2. Most of the stuff you need to run Beast…
    • Redcloth: gem install Redcloth. Make sure you add “config.gem 'RedCloth'” inside your environment.rb, so that it gets included.
    • A bunch of plugins (white_list, white_list_formatted_content, acts_as_list, gibberish, will_paginate). If you’re using Rails 2.2 or earlier, you’ll need the Engines plugin, if you’re on Rails 2.3, you don’t need Engines. The easiest way to install these en masse is just to copy the contents of savage_beast/tested_plugins to your standard Rails plugin directory (/vendor/plugins). If you already have versions of these plugins, you can just choose not to overwrite those versions
  3. Go to your application’s root directory and run “rake savage_beast:bootstrap_db” to create the database tables used by Savage Beast. If it happens you already have tables in your project with the names Savage Beast wants to use, your tables won’t be overwritten (though obviously SB won’t work without its tables). To see the tables Savage Beast uses, look in lib/tasks/savage_beast.rake in your Savage Beast plugin folder.
  4. Next run “rake savage_beast:bootstrap_assets” to copy Savage Beast stylesheets and images to savage_beast asset subdirectories within your public directory.
  5. Implement in your User model the four methods in plugins/savage_beast/lib/savage_beast/user_init that are marked as "#implement in your user model
  6. Add the line “include SavageBeast::UserInit” to your User model. Location shouldn’t matter unless you intend to override it.
  7. Add the line “include SavageBeast::ApplicationHelper” to ApplicationHelper within your application_helper.rb file.
  8. Implement versions of the methods in SavageBeast::AuthenticationSystem (located in /plugins/savage_beast/lib) in your application controller if they aren’t already there (note: technically, I believe only “login_required” and “current_user” are necessary, the others give you more functionality). Helpful commenter Adam says that if you have the “helper :all” line in your application controller, be sure to add the “SavageBeast::AuthenticationSystem” line after that.

If you’re using Rails 2.0-2.2, and thus using the Engines plugin, you’ll need a couple extra steps:

  1. Add this line to the top of your environment.rb, right after the require of boot: require File.join(File.dirname(__FILE__), '../vendor/plugins/engines/boot')
  2. Move the routes.rb file from the “savage-beast/config” directory to the root (“savage-beast”) directory of the plugin. Then add the line “map.from_plugin :savage_beast” to your routes.rb. Location shouldn’t matter unless you intend to override it.

And off you go! When you visit your_site/forums something should happen. I’ve been creating new forums by visiting /forums/new. There’s probably a hidden admin view somewhere.

Implementing Your Own Views and Controllers

Just create a new file in your /controllers or /views directories with the same name as the file you want to override in Savage Beast. If you just want to override a particular method in a controller, you can do that piecemeal if you just leave your XController empty except for the method you wanted to override (Note: I know this piecemeal method adding works with the Engines plugin installed, but haven’t tested it without).

If you’re integrating this into an existing site, I’d recommend you start by creating a forums layout page (/app/views/layouts/forums.html.erb). This will give you a taste of how easy it is to selectively override files from the plugin.

Demo

You can check out a (slightly-but-not-too-modified) version of Savage Beast online at Bonanzle. The differences between our version and the version checked into Subversion are 1) addition of topic tagging (users can tag topics to get them removed, etc) 2) recent post list shows posts in unique topics, rather than showing posts from the same topic repeatedly (there’s another blog on here about the SQL I used to do that) and 3) skinning. None of those changes feel intrinsic to what SB is “supposed to do,” which is why they aren’t checked in.

Conclusion

Comments are most welcome. I’ll be checking in changes to the project as I find bugs and improvements in using it, but this is admittedly something I don’t have a lot of spare time to closely follow (see my other entries on the wonders of entrepreneurship). Hopefully others can contribute patches as they find time. If you like the plugin, feel free to stop by Agile Development and give it a rating so that others can find it in the future.

Rails Slave Database Plugin Comparison & Review

Introduction

Based on the skimpy amount of Google results I get when I look for queries relating to Rails slave database (and/or the best rails slave database plugin), I surmise that not many Rails apps grow to the point of needing slave databases. But we have. So I’ve been evaluating the various choices intermittently over the last week, and have arrived at the following understanding of the current slave DB ecosystem:

Masochism

Credibility: Was the first viable Rails DB plugin, used to rule the roost for Google search results. The first result for “rails slave database” still points to a Masochism-based approach.

Pros: Once-high usage means that it is the best documented of the Rails slave plugins. Seems pretty straightforward to initially setup.

Cons: The author himself has admitted (in comments) that the project has fallen into a bit of a state of disrepair, and apparently it doesn’t play nice with Rails 2.2 and higher. The github lists multiple monkey patches necessary to get it working. It only appears to work with one slave DB.

master_slave_adapter

Credibility: It’s currently the most watched slave plugin-related project I can find on github (with about 90 followers). Also got mentioned in Ruby Inside a couple months ago. Has been updated in last six months.

Pros: Doesn’t use as much monkey patching to reach its goals, therefore theoretically more stable than other solutions as time passes.

Cons: Appears to only handle a connection to one slave DB. I’m not sure how many sites grow to the point of needing a slave DB, but then expect to stop growing such that they won’t need multiple slave DBs in the future? Not us. There’s also less support here than the other choices for limited use of the slave DB. This one assumes that you’ll want to use the slave for all SELECTs in the entire app, unless you’ve specifically wrapped it in a block that tells it to use the master.

Db Charmer

Credibility: Used in production by Scribd.com, which has about 4m uniques. Development is ongoing. Builds on acts_as_readonlyable, which has been around quite awhile.

Pros: Seems to strike a nice balance between the multiple database capabilities of SDP and the lightweight implementation of MSA. Allows one or more slaves to be declare in a given model, or for a model to use a different database entirely (aka db sharding). Doesn’t require any proprietary database.yml changes. Didn’t immediately break anything when I installed it.

Cons: In first hour of usage, it doesn’t work. It seems to route most of its functionality through a method called #switch_connection_to, and that method doesn’t do anything (including raise an error) when I try to call it. It just uses our existing production database rather than a slave. The documentation for this plugin is currently bordering on “non-existent,” although that is not surprising given that the plugin was only released a couple months ago. Emailed the plugin’s author a week ago to try to get some more details about it and never heard back.

Seamless Database Pool

Credibility: Highest rated DB plugin on Agile Web Development plugin directory. Has been updated in last six months.

Pros: More advertised functionality than any other slave plugin, including failover (if one of your slaves stops working, this plugin will try to use other slaves or your master). Documentation is comparatively pretty good amongst the slave DB choices, with rdoc available. Supports multiple slave databases, even allowing weighting of the DBs. And with the exception of Thinking Sphinx, it has “just worked” since dropping it in.

Cons: Tried to index Thinking Sphinx and ran into difficulty since this plugin redefines the connection adapter used in database.yml*. The changes needed to database.yml (which are quite proprietary), make me suspicious that this may also conflict with New Relic (which detects DB plugin in a similar manner to TS). Would be nice if it provided a way to specify database on a per-model basis, like Db Magic. Also, would inspire more confidence if this had a Github project to gauge number of people using this.

Conclusion

Unfortunately, working with multiple slave databases in Rails seems to be one of the “wild west” areas of development. It’s not uninhabited, but there is no go-to solution that seems ready to drop in and work with Rails 2.2 and above. For those running Rails 2.2+ and looking to use multiple slaves, Db Magic and Seamless Database Pool are the two clear frontrunners. I like the simpler, model-driven style plus lack of database.yml weirdness of Db Magic. But I really like the extra functionality of SDP. At this point, our choice will probably boil down to which one gives us the least hassle to get working, and that appears to be SDP, which worked immediately except for Thinking Sphinx.

I’ll be sure to post updates as I get more familiar with these plugins. Especially if it looks like there is any intelligent life out there besides me that is attempting to get this working.

Update 10/13: The more I use SDP, the more I’m getting to like it. Though I was initially drawn to the Db Magic model-based approach to databases, I now think that the SDP action-based approach might make more sense. Rationale: Most of the time when we’re rendering a page, we’ll be using data from models that are deeply connected, i.e., a user has user_settings and extend_user_info models associated with it. We could end up in hot water if the user model used a slave, while the user_settings used the master and extended_user_info used a different slave, as would be possible with a model-based slave approach. SDP abstracts away this by ensuring that every SELECT statement in the action will automatically use the same slave database from within your slave pool.

Also, though I didn’t notice it documented at first, SDP is smart enough to know that even if you marked an action to read from the slave pool, if you happen to call an INSERT/UPDATE/DELETE within the action, it will still use the master.

* Thinking Sphinx will still start/stop with SDP, it just won’t index. Luckily for us, we are already indexing our TS files on a separate machine, so I’ll just setup the database.yml on the TS building machine to not use SDP, which ought to solve the problem for us. If you know of a way to get TS to index with SDP installed, please do post to the comments below.