EC2 vs. Heroku vs. Blue Box Group for Rails Hosting

Not to kick a company when it’s (literally) down, but today’s 12+ hours EC2 outage has finally driven me to write a blog post I’ve been holding in my head for several months now: comparing a few of the major players within today’s Rails hosting ecosystem. In this post, I’ll compare and contrast EC2, Heroku, and Blue Box Group. I’ve chosen these three not only because of their popularity, but also because I believe each has a value proposition distinct from the other two, which which makes each ideally suited for different types of customers. In the interests of full disclosure, we are a current Blue Box Group customer, but we have spent a great deal of time looking at our choices, and I think that all have their advantages in specific situations. The facts and opinions below are the result of weighing data gleaned from Googling, popular opinion, and three years running a Rails site that has grown to serve more than 2m uniques per month.

Heroku vs. EC2 vs. Blue Box Group, Round 1: Price

Let’s start by establishing a pricing context. We’ll use a dedicated (not-shared resource/virtual) 8 GB, 8 core server with 1TB bandwidth as our baseline, since it is available across all three services (with some caveats, mentioned below).

Heroku EC2 (High-CPU XL) BBG
Dedicated 8-Core Server with 8 GB $800.00 $490.00 $799.00*
1 TB Bandwidth $0.00 $125.00 $0.00
Total $800.00 $615.00-695.00** $799.00

* In the case of BBG, their most current prices aren’t on their pricing page. They should fix that.

** This doesn’t include I/O costs for Amazon EBS. While these are fairly impossible to predict (varying greatly from app to app), it sounds from Amazon like you’d be talking about something more than $40 for this. Give that we’re comparing a “high end machine” here, perhaps $80 might be a more accurate estimate, that would make the price more like $700.

Various minor approximations were made to try to get this as close to apples-to-apples as possible, but the biggest caveat is that the Heroku instance (Ika) only has about 25% the CPU as the EC2 and BBG instances (though it has the same amount of memory. They don’t configure their DBs with comparable CPU muscle). The next highest Heroku instance (Zilla) is $1600 per month, and more comparable to the other two in terms of CPU, but has twice as much memory as they do. Note that EC2 and BBG make offer discounts when committing to a year of service — I couldn’t find a comparable offer from Heroku, which is not to say that it doesn’t exist (readers?). These discounts typically range from 10-25% off the no-commitment price.

Heroku vs. EC2 vs. Blue Box Group, Round 2: Setup

Heroku is ridiculously get started with, the runaway winner of the bunch when it comes to hitting the ground running with zero red tape. Per their homepage, all you do is run a couple rake commands and you’re in business. Even cooler, they offer a vast and useful collection of add-ons to make it easy to get started on whatever the specific thing is that you app is supposed to do.

Setting up Rails with EC2 is not quite the same walk in the park, but it’s not necessarily bad. Amazon handles configuring the OS for you, so in terms of getting your app server setup, you are essentially just getting Ruby and Rubygems installed, and letting Bundler take care of the rest. If you managed to set up your development environment in Linux or on a Mac, chances are you won’t have too much trouble using packages to fill in the gaps for other non-Ruby elements to your application (like Sphinx). When EC2 gets trickier is when you start figuring out how to integrate EBS (Amazon Elastic Block storage, necessary for data that you don’t want to disappear) and the other 20 Amazon web services that you may or may not want/need to use to run your app. It can ultimately amount to quite a research project to figure out which tools you want, which ones you need, and how to tie them all together. That said, you may end up using these tools (S3 in particular) even if you use BBG or Heroku, so that cost is not entirely unique to using EC2.

Ease of getting started on Blue Box is somewhere in between EC2 and Heroku. There is no high-tech set of tools that automatically build stuff like Heroku, but unlike EC2, you have a friendly and qualified team willing to help you get your server setup in the best possible way. In my experience, when they have setup new servers, they will ask in advance how we plan to use the server, and then automatically handle getting all of the baseline stuff we’ll need installed such that we can just focus on deploying our app. Which brings me to Round 3…

Heroku vs. EC2 vs. Blue Box Group, Round 3: What’s Best Suited for What?

For pet projects, small sites, or newly started sites, I think that hosting with Heroku is a no-brainer. You can be up and running immediately, you get a huge variety of conveniences with their add-ons, and there is a wealth of Google help behind you if you should happen to encounter an trouble, given the immense user base Heroku has managed to establish. All three of these these services can scale up available hardware within hours/minutes-not-days (yay for clouds!), but Heroku is probably the most straightforward to rapidly grow an application with their “Dynos” approach. However, given their highest cost amongst the choices, and their lower-than-BBG application-specific support, the significance of those advantages will erode as your application grows into the 10’s of thousands of monthly visitors.

I believe that EC2’s greatest selling point is its price, with its scalability and ubiquity (= generally good Googlability) being close seconds. As detailed above, on balance, EC2 tends to run 20-30% cheaper than other choices by leveraging their immense scale. Nifty features like auto-scale have the promise of making instant growth possible if you get flooded after your Oprah appearance. The trade-off for those advantages is that you will get 0 application-specific support, and even getting generic system-wide support can be hit-and-miss, as folks who suffered through today’s EC2 outage learned firsthand. Transparency is not Amazon’s strong suit at this point in their evolution, which can be a real problem if you have real customers who depend on your product and want to know when they can expect to see your website again during an outage. Also, as mentioned in the setup section, figuring out your way around the Amazon product ecosystem can be dauting at first.

I would consider EC2 the best choice for intermediate-sized businesses, particularly if 100% uptime is not imperative to their existence. EC2 is a great option for bootstrapped startups who want to get online as cheaply as possible, and are willing to put in the extra work setting up their servers in exchange for those cost savings. Also, since you will probably be unclear about what kind of resources your app is going to consume as it scales, EC2 is a great proving ground to get a sense for what kind of resources you might need if you decide to venture beyond Amazon for improved reliability and service. I would also take a long look at EC2 for huge businesses that can afford their own IT department, which diminishes the significance of EC2’s lack of application-specific support or monitoring.

While their prices are competitive with EC2, I would assert that the real differentiator with Blue Box is their focus on service. Included by default for business-sized BBG customers is 24/7 pro-active human monitoring of all services, including the ability to bring servers back online if they should happen to crash and you’re not around. Having gone through a fair number of web hosts in our day, we we have come to realize that, once you are signed up at a given host, it can be a huge pain to change. Most hosts use this knowledge to their advantage, and after a very romantic honeymoon period, become inattentive to their customer’s needs after it becomes clear the customer would be hard-pressed to move.

At Blue Box, their customer-focused attitude has not diminished a bit over time. We still regularly find them answering our questions…on the weekend…within minutes of the question being sent. Equally important, Jesse Proudman (BBG CEO) has built a team around him that gives the customer the benefit of the doubt. In more than a year of being hosted at BBG, I can not ever remember them “blaming” us for server changes we’ve made that have caused havoc (not the case at some of our past hosts). Instead, BBG has a solution-focused team that is consistently personable, reasonable, and most importantly, effective when it comes to solving tricky application and server problems.

While BBG offers small VPS instances, as well as cloud servers that can quickly scale, I consider their sweet spot to be businesses that have grown beyond the point of being able to easily maintain their server cluster themselves, but they don’t want to hire an on-staff IT guy. Or maybe they do have an IT guy, but they really need two. Over the past couple years, BBG has been our “IT guy,” working to implement systems for us ranging from a Mediawiki server, to load balancers, to Mysql master-master failover clusters. And compared to having a real IT guy on staff, the price is a huge bargain (not to mention the savings on health insurance, taxes, etc.)

Another nice benefit for those that have been stung by EC2/Heroku uptime hiccups: in 20 months with BBG, our total system downtime has been something between 1-2 hours (excluding downtime caused by our mistakes).

Conclusion

The best host for a particular Rails app depends on a number of factors, including phase of development (pre-launch? newly launched? rapidly growing? already huge?), need for 100% uptime, makeup of team, and cash available. Hopefully this post will be helpful to someone trying to figure out which host makes most sense for their unique situation. Please do post any questions or anecdotes from your hosting experience for future Google visitors.

Update: Response from BBG

Blue Box emailed me after this post, with a few extra details that I believe are pertinent:

  • 4 x 300GB 15k SAS outperforms EBS (which both Heroku and Amazon rely on) by almost 30% based on our client benchmarks.
  • Neither Heroku nor Amazon provide 24 x 7 monitoring with proactive human response. This can be is a *key* differentiator, particularly when comparing costs.
  • All of our dedicated database instances are running bare metal, meaning you gain consistent and reliable performance, and aren’t subject to similar massive outages caused by the muli-tenancy of a SAN.

If anyone from Amazon or Heroku would like to provide extra details of what makes them a strong choice, I’d be only too happy to post those as well.

Nokogiri Recursively Find Children/Child from a Parent

Contrary to the pages of complex hand-written recursive methods I found on StackOverflow when Googling this, it is actually as simple as

  noko = Nokogiri::XML("my_noko_file.xml")
  parent_node = noko.root.xpath("//MyNodeName")
  children_named_floyd = parent_node.xpath(".//Floyd")

If you want to search on more complex criteria, you can also add in extra sauce to your xpath.

  noko = Nokogiri::XML("my_noko_file.xml")
 # Searches your entire XML tree for an XML node type "MyNodeName" that has an attribute "id" set to a value of '1234'
 # Then grabs the XML node of type "Something" from within the found NodeSet
  parent_node = noko.root.xpath("//MyNodeName[@id='1234']").at("Something")
  # Grab all children of the "Something" node that are of type "Floyd"
  children_named_floyd = parent_node.xpath(".//Floyd")

Nokogiri is a great gem. But I do often wish it’s docs had more examples and less byzantine explanations for common operations like these. But in the meantime, let’s hope Google will continue to fill in the gaps.

Rails 3 Slave Databases: Compare Octopus to SDP

In the crazy wild days of Rails 2.x…

In the pre-Rails 3 ecosystem, there were a number of confusingly similar choices for getting master/slave database functionality established. These options included Masochism, DB Charmer, master_slave_adapter, and seamless_database_pool, amongst others. When it came time from Bonanza to make its choice on which slave plugin to use, I made my best effort to assess the velocity and functionality of each of the prominent slave database solutions, and wrote what went on to become a fairly popular post comparing the relative strengths of each choice.

Octopus

Fast forward to Rails 3, and the field has narrowed considerably. Most all of the top Google results for Rails slave database options these days point to Octopus, and with good reason. Its documentation is sound, and its github project has maintained good velocity for the better part of the past year. Reading between the lines of the Octopus documentation, it would seem that it was built first and foremost as a tool to make it stupidly easy to shard databases; secondarily, it also supports using slave databases in a non-sharding format, but the implementation here gets a little more sketchy, as the examples show users needing to explicitly declare a given slave database for a particular query. In the documentation, this is done at query time, e.g.,

User.where(:name => "Thiago").limit(3).using(:slave_one)

or

Octopus.using(:slave_two) do
  User.create(:name => "Mike")
end

Seamless Database Pool

Upon learning about octopus, my natural inclination was to compare it to our current solution, seamless_database_pool. Admittedly, when we got to the Rails 3 party, SDP was running a bit behind. The author had been kind enough to do much of the legwork to get it compliant with AR3, but we still encountered errors actually trying to use the plugin within controllers and views the way we had been able to with the previous version.

So I fixed it.

What Seamless Database Pool now represents is a slave database plugin that is specifically built with the purpose of making it as easy as possible to A) connect to one or more weighted slave databases B) declare whether a particular Rails action should attempt to use slaves, masters or both (automatically defaulting to the master when write operations occur) and C) gracefully handle failover if one or more of the slave databases declared should become unavailable for whatever reason.

SDP does not have any built in support for sharding, so if that is what your DB needs, Octopus is your best bet. But if what you need is specifically a Rails 3 supported solution that will allow you to connect mix and match your main database and N number of slaves, in a weighted way and with failover automatically baked in, this is where seamless_database_pool really shines.

Bonanza has been using SDP in production for more than a year now, and in the meantime have experience failures of our slave database every few months, which at one point what have brought down the entire site. Now, within seconds, Rails figure out that it needs to re-route requests and finds a database it can use that is still available. The still-good SDP documenation describes how to make it happen.

Bottom line

Prior to writing this blog, if you Google master/slave database you would probably come away thinking there was only one solution, and that solution was only secondarily focused on allowing N slaves to be configured. I may be wrong about the level of support that Octopus already had for setting up multiple weighted failover slaves (and being able to declare usage of these on a per-action vs. per-query basis), but the documentation makes me think that this is at best a future roadmap feature. In the meantime, if it’s specifically database support you need, try the drag-and-droppable SDP gem. I will continue linking my fork of the project until the original author decides what he wants to do with my pull request (which fixes fundamental issues with Rails 3 controller integration, plus adds more robust slave failover).

Installation

Is as easy as possible. In your bundler Gemfile:

gem “seamless_database_pool”, :git => “git://github.com/wbharding/seamless_database_pool.git”

Your database.yml file will then look something like:

production:
  adapter: seamless_database_pool
  port: 3306
  username: app_user
  password: app_pass
  pool_adapter: mysql
  master:
    host: 1.2.3.4
    pool_weight: 0 # 0 means we only use master for writes if the controller action has been setup to use slaves
  read_pool:
    - host: 2.3.4.5
      username: slave_login
      password: slave_pass

Do drop a line in the comments with any questions or feedback if you have experience with either SDP or Octopus as solutions for Rails slave database support!

Rails 3 Performance: Abysmal to Good to Great

So you’ve upgraded from Rails 2.x to Rails 3 and you’re not happy with the response times you’re seeing? Or you are considering the upgrade, but don’t want to close your eyes and step into the great unknown without having some idea what to expect from a performance standpoint? Well, here’s what.

Performance in Rails 2.2

Prior to upgrading, this action (one of the most commonly called in our application) averaged 225-250 ms.

Rails 3.0.4 Day 1

One our first day after upgrading, we found the same unmodified action averaging 480ms. Not cool.

Thus began the investigative project. Since New Relic was not working with Rails 3 views this week (seriously. that’s a whole different story), I (sigh) headed into the production logs, which I was happy to discover actually broke out execution times by partial. But there seemed to be an annoyingly inconsistent blip every time we called this action, where one of the partials (which varied from action to action) would have would have something like 250-300 ms allocated to it.

I casually mentioned this annoyance to Mr. Awesome (aka Jordan), who speculated that it could have something to do with garbage collection. I’d heard of Ruby GC issues from time to time in the past, but never paid them much mind since I assumed that, since we were already using Ruby Enterprise Edition, the defaults would likely be fine enough. But given my lack of other options, I decided to start the investigation. That’s when I discovered this document from those otherworldly documenters at Phusion, describing the memory settings that “Twitter uses” (quoted because it is probably years old) to run their app. Running low on alternatives, I gave it a shot. Here were our results:

304 Twitter mem settings

New average time: 280ms. Now that is change we can believe in! Compared with the default REE, we were running more than 40% faster, practically back to our 2.2 levels. (Bored of reading this post and want to go implement these changes immediately yourself? This is a great blog describing how. Thanks, random internet dude with broken commenting system!)

That was good. But it inspired Jordan to start tracking our garbage collection time from directly within New Relic (I don’t have a link to help ya there, but Google it — I assure you it’s possible, and awesome), and we discovered that even with these changes, we were still spending a good 25-30% of our time collecting garbage in our app (an improvement over the 50-60% from when we initially launched, but still). I wondered if we could get rid of GC altogether by pushing our garbage collection to happen between requests rather than during them?

Because every director will tell you that audiences love a good cliffhanger, I’ll leave that question for the reader to consider. Hint: after exploring the possibility, our action is now faster in Rails 3 than it had been in Rails 2. It’s all about the garbage, baby.

Free Lock on Single Mysql Table Row

We ran into a problem today where a single row in one of our tables seemed to get stuck in a state where any query that tried to update it would hit our lock wait timeout of 50 seconds. I googled and googled to try to figure out a straightforward way to release this lock, but the closest thing I could find was a big assed Mysql page on table locks that lacked any specific solutions and this Stack Overflow post that suggests fixing a similar problem by dropping the entire table and re-importing it (uh, no thanks).

After some trial and error, I came up with two viable ways to track down and fix this problem.

The first way is to actually look at Mysql’s innodb status by logging into your Mysql server and running

show innodb status\G

This will list any known locks by Mysql and what it’s trying to do about them. In our case, the locked row did not show up in the innodb pool status, so instead I executed

show processlist;

This listed everything that currently had a connection open to Mysql, and how long it’s connection has been open for. In Rails it is a bit hard to spot which connection might be the one to blame, since every Rails instance leaves its connection open whether it is waiting for a transaction to complete or it is doing nothing. In today’s case, I happened to have a good hunch about which of the 50 connections might be the problem one (even though it was listed as being in the “sleep” state…strangely), so I killed it by restarting the server, and all was well. However, I could have also killed it using:

kill process [process id];

If you don’t happen to know which of your processes has done the lock, the only recourse I know would be to restart your servers and see which Mysql processes remain open after the servers have reset their connections. If a process stays connected when it’s parent has left, then it is your enemy, and it must be put down. Hope this methodology helps someone and/or my future self.

Rails 3 Unit and Integration Tests Slow: So Fix It

Another topic for which there is no shortage of Google results, but none that have really helped us to figure out why it takes upwards of three minutes for our tests to start. Finally reached my breaking point today and decided to track down what was behind the painful slowness of our test invocation.

The answer, for us, was the time it takes to purge and re-load the database when tests were starting. To figure this out, I simply ran

rake test --trace

And observed as we sat for two minutes on the db:schema:load. Ouch.

The good news is that, since we don’t use fixtures (chill runs down spine at the thought), there is really no need for us to purge our test DB every time we want to invoke tests. So, I decided to hack together a patch to allow tests to begin without the usual purge/load dog and pony show that goes with starting tests. If that’s the sort of thing that you’d be into, here’s the code you need to make it happen (in Rails 3, not sure how this might vary for Rails 2):

Rake::TaskManager.class_eval do
  def remove_task(task_name)
    @tasks.delete(task_name.to_s)
  end
end

Rake.application.remove_task('db:test:load')

namespace :db do
  namespace :test do
    # Re-creating the DB every time we run tests sux...
    task :load do
    end
  end
end

Basically, the first bit is to undefined the existing “db:test:load” method, and the last part redefines db:test:load to do nothing. Add this as a mixin in your config/initializers directory and you should be golden.

If you do want to re-create your test database on occasion, you can still do that by running

  rake db:test:purge RAILS_ENV=test
  rake db:schema:load RAILS_ENV=test

Chances are that the RAILS_ENV isn’t even necessary there, but I don’t really feel like testing whether or not it deletes my main database atm.

Using this, our test start time dropped from about 3 minutes to 10 seconds. And that’s change that we can believe in.

Rails 3 Autoload Modules/Classes – 3 Easy Fixes

Judging by the number of queries on Stack Overflow, it is a very common problem for folks using Rails 3 that they are not getting the modules or classes from their lib directory loaded in Rails 3, especially in production. This leads to errors like “NoMethodError,” “No such file to load [file in your lib path],” and “uninitialized constant [thing you’re trying to load].” From the information I canvassed trying to solve the problem this morning, there are three common reasons (and one uncommon reason) that this can happen. I’ll go in order of descending frequency.

1. You haven’t included the lib directory in your autoload path

Rails 3 has been updated such that classes/modules (henceforth, C/M) are lazy loaded from the autoload paths as they are needed. A common problem, especially for the newly upgraded, is that the C/M you are trying to load isn’t in your autoload path. Easy enough fix:

config.autoload_paths += Dir["#{config.root}/lib/**/"]

This will autoload your lib directory and any subdirectories of it. Similarly, if you have model subdirectories, you may need to add

config.autoload_paths += Dir["#{config.root}/app/models/**/"]

To your application.rb file.

2. Your C/M is not named in the way Rails expects

Rails has opinions about what the name of your C/M should be, and if you name it differently, your C/M won’t be found. Specifically, the namespace for your C/M should match the directory structure that leads to it. So if you have a C/M with the filename “search.rb” in the lib/google directory, it’s name should be “Google::Search.” The name of the C/M should match the filename, and the name of the directory (in this case “Google”) should be part of the namespace for your module. See also this.

3. Your application is threadsafe.

This was one that was hard to find on SO or Google, but if you have configured your application to be threadsafe (done by default in production.rb), then the C/Ms you have, even if they exist in your autoload path, will not be defined until you require them in your application. According to the Rails docs, there is apparently something not threadsafe about automatically loading the C/Ms from their corresponding files. To workaround this, you can either require your individual library files before you use them, or comment out config.threadsafe!

4. (Esoteric) You have a model in a subdirectory, where the name of the subdirectory matches the name of the model.

Yes, this is another error we actually had when upgrading from Rails 2 to 3: as of Rails 3.0.4 there is a bug wherein, if you have a directory app/models/item and you have the file “item.rb” in your app/models/item directory, then you will get errors from Rails that your models are undefined (usually it picks the second model in the item subdirectory). We fixed this by just renaming our subdirectories where appropriate. Hopefully in one of the next versions of Rails this will get fixed and no longer be a possible problem.

Had other reasons why C/Ms failed to load for you? Do post below and we can gather up all the possibilities in one tidy location.

Traits & Qualities of Best Developers on a Deserted Island

What is programming perfection? While the qualities that comprise “the most productive” developer will vary to some extent based on industry and role, I believe there are a number of similarities that productive developers tend to share. Understanding these similarities is a key to improving oneself (if oneself == developer) or differentiating from a pool of similar-seeming developer prospects. In my assorted experience (full time jobs in Pascal, C, C++, C#, and now Ruby (damn I sound old)), I have repeatedly observed developers who vary in productivity by a factor of 10x, and I believe it is a worthwhile exercise to try to understand the specifics behind this vast difference.

<OBLIGATORY DISCLAIMER>

I do not profess to be kid-tested or mother-approved (let alone academically rigorous or scientifically proven) when it comes to prescribing what exactly these qualities are. But I do have a blog, an opinion, and an eagerness to discuss this topic with others. I realize this is a highly subjective, but that’s part of what makes it fun to try to quantify.

</ OBLIGATORY DISCLAIMER>

As a starting point for discussion, I hereby submit the following chart which approximates the general progression I’ve observed in developers that move from being marginally productive to absurdly productive. It works from the bottom (stuff good junior programmers do) to the top (stuff superstars do).

Levels of Developer Productivity

And here is my rationale, starting from the bottom:
mario_1

Level 1: Qualities possessed by an effective newbie

Comments on something, somewhere. A starting point for effective code commenting is that you’re not too lazy or obstinate to do it in principle.

Self-confident. Programmers that lack self-confidence can never be extremely effective, because they spend so much time analyzing their own code and questioning their lead about their code. That said, if you’re learning, it’s far better to know what you don’t know than to think you do know when you really don’t. So this one can be a difficult balance, particularly as coders grow in experience and become less willing to ask questions.

Abides by existing conventions. This is something that new coders actually tend to have working in their favor over veterans. On balance, they seem to be more willing to adapt to the conventions of existing code, rather than turning a five person programming project into a spaghetti codebase with five different styles. A codebase with multiple disparate conventions is a subtle and insidious technical debt. Usually programmers starting out are pretty good at avoiding this, even if their reason is simply that they don’t have their own sense for style.

Doesn’t stay stuck on a problem without calling for backup. This ties into the aforementioned danger with being self-confident. This is another area where young programmers tend to do pretty well, while more experienced coders can sometimes let themselves get trapped more frequently. But if you can avoid this when you’re starting, you’re getting off on the right foot.

mario_2

Level 2: Qualities possessed by an effective intermediate programmer

Understand the significance of a method’s contract. Also known as “writes methods that don’t have unexpected side effects.” This basically means that the developer is good at naming their functions/methods, such that the function/method does not affect the objects that are passed to it in a way that isn’t implied by its name. For example, when I first started coding, I would write functions with names like “inflictDamage(player)” that might reduce a player’s hitpoints, change their AI state, and change the AI state of the enemies around the player. As I became more experienced, I learned that “superfunctions” like this were not only impossible to adapt, but they were very confusing when read by another programmer from their calling point: “I thought it was just supposed to inflict damage, why did it change the AI state of the enemies?”

Figures out missing steps when given tasks. This is a key difference between a developer that is a net asset or liability. Often times, a level 0 developer will appear to be getting a lot done, but their productivity depends on having more advanced programmers that they consult with whenever they confront a non-trivial problem. As a lead developer, I would attempt to move my more junior developers up the hierarchy by asking “Well, what do you think?” or “What information would you need to be able to answer that question?” This was usually followed by a question like, “so what breakpoint could you set in your debugger to be able to get the information you need?” By helping them figure it out themselves, it builds confidence, while implicitly teaching that it is not more efficient to break their teammates’ “mental context” for problems that they have the power to solve themselves.

Consistently focused, on task. OK, this isn’t really one that people “figure out” at level two, so much as it is a quality that usually accounts for a 2x difference in productivity between those that have it and those that don’t. I don’t know how you teach this, though, and I’d estimate that about half of the programmers I worked with at my past jobs ended up spending 25-50% of their day randomly browsing tech sites (justified in their head as “research?”). Woe is our GDP.

Handy with a debugger. Slow: figure out how a complicated system works on paper or in your head. Fast: run a complicated system and see what it does. Slow: carefully consider how to adapt a system to make it do something tricky. Fast: change it, put in a breakpoint, does it work? What variables cause it not to? Caveat: You’ve still got to think about edge cases that didn’t naturally occur in your debugging state.

mario_3

Level 3: Hey, you’re pretty good!

Thinks through edge cases / good bug spotter. When I worked at my first job, I often felt that programmers should be judged equally on the number of their bugs they fixed and the number of other programmers bugs’ they found. Of course, there are those that will make the case that bug testing is the job of QA. And yes, QA is good. But QA could never be as effective as a developer that naturally has a sense for the edge cases that could affect their code so don’t write those bugs in the first place. And getting QA to try to reproduce intermittent bugs is usually no less work than just examining the code and thinking about what might be broke.

Unwilling to accept anomalous behavior. Good programmers learn that there is a serious cost to “code folklore” — the mysterious behaviors that have been vaguely attributed to a system without understanding the full root cause. Code folklore eventually leads to code paranoia, which can cause inability to refactor, or reluctance to touch that one thing that is so ugly, yet so mysterious and fragile.

Understands foreign code quickly. If this post is supper, here’s the steak. Weak developers need code to be their own to be able to work with it. And proficient coders are proficient because, for the 75% of the time that you are not working in code that you recently wrote, you can figure out what’s going on without hours or days of rumination. More advanced versions of this trait are featured in level four (“Can adapt foreign code quickly”) and level five (“Accepts foreign code as own”). Stay tuned for the thrilling conclusion on what the experts can do with code that isn’t theirs.

Doesn’t write the same code twice. OK, I’ve now burned through at least a thousand words without being a language zealot, so please allow me this brief lapse into why I heart introspective languages: duplicated code is the root of great evil. The obvious drawback of writing the same thing twice is that it took longer to write it, and it will take longer to adapt it. The more sinister implications are that, if the same method is implemented in three different ways, paralysis often sets in and developers become unwilling to consolidate the methods or figure out which is the best one to call. In a worst case scenario, they write their own method, and the spiral into madness is fully underway.
mario_4

Level 4: You’re one of the best programmers in your company

Comments consistently on code goals/purpose/gotchyas. Bonus points for examples. You notice that I haven’t mentioned code commenting since level one? It is out of my begrudging respect for the Crazed Rubyists who espouse the viewpoint that “good code is self-documenting.” In an expressive language, I will buy that to an extent. But to my mind, there is no question that large systems necessarily have complicated parts, and no matter how brilliantly you implement those parts, the coders that follow you will take longer to assimilate them if the docs are thin. Think about how you feel when you find a plugin or gem that has great documentation. Do you get that satisfying, “I know what’s going on and am going to implement this immediately”-sort of feeling? Or do you get the “Oh God this plugin looks like exactly what I need but it’s going to take four hours to figure out how to use the damned thing.” An extra hour spent by the original programmer to throw you a bone would have saved you (and countless others) that time. Now do you sympathize with their viewpoint that their code is “self-documenting?”

Can adapt foreign code quickly. This is the next level of “understands foreign code quickly.” Not only do you understand it, but you know how to change it without breaking stuff or changing its style unnecessarily. Go get ’em.

Doesn’t write the same concept twice. And this is the next level of “doesn’t write the same code twice.” In a nutshell, this is the superpower that good system architects possess: a knack for seeming patterns and similarities across a system, and knowing how to conceptualize that pattern into a digestible unit that is modular, and thus maintainable.
mario_5

Level 5: Have I mentioned to you that we’re hiring?

Constant experimenting for incremental gains. The best of the best feel an uncomfortable churn in their stomach if they have to use “find all” to get to a method’s definition. They feel a thrill of victory if they can type in “hpl/s” to open a file in the “hand_picked_lists” directory called “show” (thank you Rubymine!). They don’t settle for a slow development environment, build process, or test suite. They cause trouble if they don’t have the best possible tools to do their job effectively. Each little thing the programming expert does might only increase their overall productivity by 1% or less, but since developer productivity is a bell curve, those last few percent ratchet the expert developer from the 95th to 99th percentile.

Accepts foreign code as their own. OK, I admit that this is a weird thing to put at the top of my pyramid, but it’s so extremely rare and valuable that I figure it will stand as a worthwhile challenge to developers, if nothing else. Whereas good developers understand others’ code and great developers can adapt it, truly extraordinary developers like foreign code just as much as they like their own. In a trivial case, their boss loves them because rather than complaining that code isn’t right, they will make just the right number of revisions to improve what needs to be improved (and, pursuant to level 3, they will have thought through edge cases before changing). In a more interesting example, they might take the time to grok and extensively revise a plugin that most developers would just throw away, because the expert developer has ascertained that it will be 10% faster to make heavy revisions than to rewrite from scratch. In a nutshell, whereas most developers tolerate the imperfections in code that is not their own, experts empathize with how those imperfections came about, and they have a knack for figuring out the shortest path to making the code in question more usable. And as a bonus, they often “just do it” sans the lamentations of their less productive counterparts.

This isn’t to say they won’t revise other people’s crappy code. But they’re just as likely to revise the crappy code of others as they are their own crappy code (hey, it happens). The trick that these experts pull is that they weigh the merits of their own code against the code of others with only one objective: what will get the job done best?

Teach me better

What qualities do you think are shared by the most effective developers? What do you think are the rarest and most desirable qualities to find? Would love to hear from a couple meta-thinkers to compare notes on the similarities you’ve observed amongst your most astoundingly productive.

Rails: Beware the custom truncate

Ran into an interesting bug a few days ago that I’ve been meaning to document for anyone who has written their own truncating function in Rails. If you have, I’d guess it probably looked something like this:

def truncate(str, length)
	return '' if str.blank?
	truncated = str.size > length
	(str[0..(truncated ? length - 3 : length)] + (truncated ? "..." : ''))
end

Fine enough. Unless you happen to be truncating user-given strings.

If you are letting your users enter in the information that gets truncated, chances are that some of them are entering unicode characters for accents, quotation marks, etc. Because unicode characters are 2-4 bytes long, the above truncate will split characters in half and cause general headaches if it truncates text that has unicode characters in it.

Split-in-half characters are bad news. They will cause errors like “illegal JSON output,” which is how I originally spotted this as a problem with our truncate method.

The solution is to take a page from Rails’ own truncate, and use #mb_chars. So a hand-written truncate that works for unicode would be:

def truncate(str, length)
	return '' if str.blank?
	truncated = str.size > length
	(str.mb_chars[0..(truncated ? length - 3 : length)] + (truncated ? "..." : '')).to_s
end

You’re welcome.

Join Multiple Copies of Same Model in Thinking Sphinx Index

This was a vexing problem that probably affects 0.1% of all Thinking Sphinx users, but for those select few, you can benefit from my pain.

We have a model with the following associations:

has_many :merch_match_data_tags, :class_name => "MerchMatchItemData", :dependent => :delete_all
has_one :mm_bizarre, :class_name => "MerchMatchItemData", :conditions => { :data_tag_id => MerchMatchItemData::BIZARRE }
has_one :mm_good_picture, :class_name => "MerchMatchItemData", :conditions => { :data_tag_id => MerchMatchItemData::NICE_PICTURE }
has_one :mm_funny, :class_name => "MerchMatchItemData", :conditions => { :data_tag_id => MerchMatchItemData::FUNNY }

Intending to add these to a Sphinx index, we used the following code:

has mm_good_picture.tag_count, :as => :good_picture_points
has mm_bizarre.tag_count, :as => :bizarre_points
has mm_funny.tag_count, :as => :funny_points

What perplexed me after trying this was that while the “good_picture_points” could be queried and sorted, bizarre_points and funny_points returned no Sphinx results. Looking into the output generated by thinking_sphinx:configure, I discovered why:

...
LEFT OUTER JOIN `merch_match_item_datas` ON merch_match_item_datas.item_id = items.id AND `merch_match_item_datas`.`data_tag_id` = 0   LEFT OUTER JOIN `merch_match_item_datas` mm_bizarres_items ON mm_bizarres_items.item_id = items.id AND `merch_match_item_datas`.`data_tag_id` = 2   LEFT OUTER JOIN `merch_match_item_datas` mm_good_values_items ON mm_good_values_items.item_id = items.id AND `merch_match_item_datas`.`data_tag_id` = 7   LEFT OUTER JOIN `merch_match_item_datas` mm_funnies_items ON mm_funnies_items.item_id = items.id AND `merch_match_item_datas`.`data_tag_id` = 4  
...

The problem was that, in determining the SQL to build, Thinking Sphinx uses the first association it comes across as the default set of conditions for all future joins to the table. So, in this case, anything that joined the merch_match_item_datas table was going to be joining that table with the data_tag_id = 0 condition of our first declared association (mm_good_picture_tag). That is, mm_bizarre now was looking for data_tag_id=0 and data_tag_id=[id of bizarre tag]. So, nothing was returned.

After a bit of head scratching, I came up with the following workaround for this:

has merch_match_data_tags.tag_count
has mm_good_picture.tag_count, :as => :good_picture_points
has mm_bizarre.tag_count, :as => :bizarre_points
has mm_good_value.tag_count, :as => :good_value_points

Basically, just make the first association that Thinking Sphinx encounter be an unqualified, unfiltered association to the merch_match_data_table. This ensures that the proper join structure is setup, so all of the subsequent has attributes function as they should.

Hope that I’m not the only one ever to find this useful.