Traits & Qualities of Best Developers on a Deserted Island

What is programming perfection? While the qualities that comprise “the most productive” developer will vary to some extent based on industry and role, I believe there are a number of similarities that productive developers tend to share. Understanding these similarities is a key to improving oneself (if oneself == developer) or differentiating from a pool of similar-seeming developer prospects. In my assorted experience (full time jobs in Pascal, C, C++, C#, and now Ruby (damn I sound old)), I have repeatedly observed developers who vary in productivity by a factor of 10x, and I believe it is a worthwhile exercise to try to understand the specifics behind this vast difference.

<OBLIGATORY DISCLAIMER>

I do not profess to be kid-tested or mother-approved (let alone academically rigorous or scientifically proven) when it comes to prescribing what exactly these qualities are. But I do have a blog, an opinion, and an eagerness to discuss this topic with others. I realize this is a highly subjective, but that’s part of what makes it fun to try to quantify.

</ OBLIGATORY DISCLAIMER>

As a starting point for discussion, I hereby submit the following chart which approximates the general progression I’ve observed in developers that move from being marginally productive to absurdly productive. It works from the bottom (stuff good junior programmers do) to the top (stuff superstars do).

Levels of Developer Productivity

And here is my rationale, starting from the bottom:
mario_1

Level 1: Qualities possessed by an effective newbie

Comments on something, somewhere. A starting point for effective code commenting is that you’re not too lazy or obstinate to do it in principle.

Self-confident. Programmers that lack self-confidence can never be extremely effective, because they spend so much time analyzing their own code and questioning their lead about their code. That said, if you’re learning, it’s far better to know what you don’t know than to think you do know when you really don’t. So this one can be a difficult balance, particularly as coders grow in experience and become less willing to ask questions.

Abides by existing conventions. This is something that new coders actually tend to have working in their favor over veterans. On balance, they seem to be more willing to adapt to the conventions of existing code, rather than turning a five person programming project into a spaghetti codebase with five different styles. A codebase with multiple disparate conventions is a subtle and insidious technical debt. Usually programmers starting out are pretty good at avoiding this, even if their reason is simply that they don’t have their own sense for style.

Doesn’t stay stuck on a problem without calling for backup. This ties into the aforementioned danger with being self-confident. This is another area where young programmers tend to do pretty well, while more experienced coders can sometimes let themselves get trapped more frequently. But if you can avoid this when you’re starting, you’re getting off on the right foot.

mario_2

Level 2: Qualities possessed by an effective intermediate programmer

Understand the significance of a method’s contract. Also known as “writes methods that don’t have unexpected side effects.” This basically means that the developer is good at naming their functions/methods, such that the function/method does not affect the objects that are passed to it in a way that isn’t implied by its name. For example, when I first started coding, I would write functions with names like “inflictDamage(player)” that might reduce a player’s hitpoints, change their AI state, and change the AI state of the enemies around the player. As I became more experienced, I learned that “superfunctions” like this were not only impossible to adapt, but they were very confusing when read by another programmer from their calling point: “I thought it was just supposed to inflict damage, why did it change the AI state of the enemies?”

Figures out missing steps when given tasks. This is a key difference between a developer that is a net asset or liability. Often times, a level 0 developer will appear to be getting a lot done, but their productivity depends on having more advanced programmers that they consult with whenever they confront a non-trivial problem. As a lead developer, I would attempt to move my more junior developers up the hierarchy by asking “Well, what do you think?” or “What information would you need to be able to answer that question?” This was usually followed by a question like, “so what breakpoint could you set in your debugger to be able to get the information you need?” By helping them figure it out themselves, it builds confidence, while implicitly teaching that it is not more efficient to break their teammates’ “mental context” for problems that they have the power to solve themselves.

Consistently focused, on task. OK, this isn’t really one that people “figure out” at level two, so much as it is a quality that usually accounts for a 2x difference in productivity between those that have it and those that don’t. I don’t know how you teach this, though, and I’d estimate that about half of the programmers I worked with at my past jobs ended up spending 25-50% of their day randomly browsing tech sites (justified in their head as “research?”). Woe is our GDP.

Handy with a debugger. Slow: figure out how a complicated system works on paper or in your head. Fast: run a complicated system and see what it does. Slow: carefully consider how to adapt a system to make it do something tricky. Fast: change it, put in a breakpoint, does it work? What variables cause it not to? Caveat: You’ve still got to think about edge cases that didn’t naturally occur in your debugging state.

mario_3

Level 3: Hey, you’re pretty good!

Thinks through edge cases / good bug spotter. When I worked at my first job, I often felt that programmers should be judged equally on the number of their bugs they fixed and the number of other programmers bugs’ they found. Of course, there are those that will make the case that bug testing is the job of QA. And yes, QA is good. But QA could never be as effective as a developer that naturally has a sense for the edge cases that could affect their code so don’t write those bugs in the first place. And getting QA to try to reproduce intermittent bugs is usually no less work than just examining the code and thinking about what might be broke.

Unwilling to accept anomalous behavior. Good programmers learn that there is a serious cost to “code folklore” — the mysterious behaviors that have been vaguely attributed to a system without understanding the full root cause. Code folklore eventually leads to code paranoia, which can cause inability to refactor, or reluctance to touch that one thing that is so ugly, yet so mysterious and fragile.

Understands foreign code quickly. If this post is supper, here’s the steak. Weak developers need code to be their own to be able to work with it. And proficient coders are proficient because, for the 75% of the time that you are not working in code that you recently wrote, you can figure out what’s going on without hours or days of rumination. More advanced versions of this trait are featured in level four (“Can adapt foreign code quickly”) and level five (“Accepts foreign code as own”). Stay tuned for the thrilling conclusion on what the experts can do with code that isn’t theirs.

Doesn’t write the same code twice. OK, I’ve now burned through at least a thousand words without being a language zealot, so please allow me this brief lapse into why I heart introspective languages: duplicated code is the root of great evil. The obvious drawback of writing the same thing twice is that it took longer to write it, and it will take longer to adapt it. The more sinister implications are that, if the same method is implemented in three different ways, paralysis often sets in and developers become unwilling to consolidate the methods or figure out which is the best one to call. In a worst case scenario, they write their own method, and the spiral into madness is fully underway.
mario_4

Level 4: You’re one of the best programmers in your company

Comments consistently on code goals/purpose/gotchyas. Bonus points for examples. You notice that I haven’t mentioned code commenting since level one? It is out of my begrudging respect for the Crazed Rubyists who espouse the viewpoint that “good code is self-documenting.” In an expressive language, I will buy that to an extent. But to my mind, there is no question that large systems necessarily have complicated parts, and no matter how brilliantly you implement those parts, the coders that follow you will take longer to assimilate them if the docs are thin. Think about how you feel when you find a plugin or gem that has great documentation. Do you get that satisfying, “I know what’s going on and am going to implement this immediately”-sort of feeling? Or do you get the “Oh God this plugin looks like exactly what I need but it’s going to take four hours to figure out how to use the damned thing.” An extra hour spent by the original programmer to throw you a bone would have saved you (and countless others) that time. Now do you sympathize with their viewpoint that their code is “self-documenting?”

Can adapt foreign code quickly. This is the next level of “understands foreign code quickly.” Not only do you understand it, but you know how to change it without breaking stuff or changing its style unnecessarily. Go get ’em.

Doesn’t write the same concept twice. And this is the next level of “doesn’t write the same code twice.” In a nutshell, this is the superpower that good system architects possess: a knack for seeming patterns and similarities across a system, and knowing how to conceptualize that pattern into a digestible unit that is modular, and thus maintainable.
mario_5

Level 5: Have I mentioned to you that we’re hiring?

Constant experimenting for incremental gains. The best of the best feel an uncomfortable churn in their stomach if they have to use “find all” to get to a method’s definition. They feel a thrill of victory if they can type in “hpl/s” to open a file in the “hand_picked_lists” directory called “show” (thank you Rubymine!). They don’t settle for a slow development environment, build process, or test suite. They cause trouble if they don’t have the best possible tools to do their job effectively. Each little thing the programming expert does might only increase their overall productivity by 1% or less, but since developer productivity is a bell curve, those last few percent ratchet the expert developer from the 95th to 99th percentile.

Accepts foreign code as their own. OK, I admit that this is a weird thing to put at the top of my pyramid, but it’s so extremely rare and valuable that I figure it will stand as a worthwhile challenge to developers, if nothing else. Whereas good developers understand others’ code and great developers can adapt it, truly extraordinary developers like foreign code just as much as they like their own. In a trivial case, their boss loves them because rather than complaining that code isn’t right, they will make just the right number of revisions to improve what needs to be improved (and, pursuant to level 3, they will have thought through edge cases before changing). In a more interesting example, they might take the time to grok and extensively revise a plugin that most developers would just throw away, because the expert developer has ascertained that it will be 10% faster to make heavy revisions than to rewrite from scratch. In a nutshell, whereas most developers tolerate the imperfections in code that is not their own, experts empathize with how those imperfections came about, and they have a knack for figuring out the shortest path to making the code in question more usable. And as a bonus, they often “just do it” sans the lamentations of their less productive counterparts.

This isn’t to say they won’t revise other people’s crappy code. But they’re just as likely to revise the crappy code of others as they are their own crappy code (hey, it happens). The trick that these experts pull is that they weigh the merits of their own code against the code of others with only one objective: what will get the job done best?

Teach me better

What qualities do you think are shared by the most effective developers? What do you think are the rarest and most desirable qualities to find? Would love to hear from a couple meta-thinkers to compare notes on the similarities you’ve observed amongst your most astoundingly productive.

Rails: Beware the custom truncate

Ran into an interesting bug a few days ago that I’ve been meaning to document for anyone who has written their own truncating function in Rails. If you have, I’d guess it probably looked something like this:

def truncate(str, length)
	return '' if str.blank?
	truncated = str.size > length
	(str[0..(truncated ? length - 3 : length)] + (truncated ? "..." : ''))
end

Fine enough. Unless you happen to be truncating user-given strings.

If you are letting your users enter in the information that gets truncated, chances are that some of them are entering unicode characters for accents, quotation marks, etc. Because unicode characters are 2-4 bytes long, the above truncate will split characters in half and cause general headaches if it truncates text that has unicode characters in it.

Split-in-half characters are bad news. They will cause errors like “illegal JSON output,” which is how I originally spotted this as a problem with our truncate method.

The solution is to take a page from Rails’ own truncate, and use #mb_chars. So a hand-written truncate that works for unicode would be:

def truncate(str, length)
	return '' if str.blank?
	truncated = str.size > length
	(str.mb_chars[0..(truncated ? length - 3 : length)] + (truncated ? "..." : '')).to_s
end

You’re welcome.

Join Multiple Copies of Same Model in Thinking Sphinx Index

This was a vexing problem that probably affects 0.1% of all Thinking Sphinx users, but for those select few, you can benefit from my pain.

We have a model with the following associations:

has_many :merch_match_data_tags, :class_name => "MerchMatchItemData", :dependent => :delete_all
has_one :mm_bizarre, :class_name => "MerchMatchItemData", :conditions => { :data_tag_id => MerchMatchItemData::BIZARRE }
has_one :mm_good_picture, :class_name => "MerchMatchItemData", :conditions => { :data_tag_id => MerchMatchItemData::NICE_PICTURE }
has_one :mm_funny, :class_name => "MerchMatchItemData", :conditions => { :data_tag_id => MerchMatchItemData::FUNNY }

Intending to add these to a Sphinx index, we used the following code:

has mm_good_picture.tag_count, :as => :good_picture_points
has mm_bizarre.tag_count, :as => :bizarre_points
has mm_funny.tag_count, :as => :funny_points

What perplexed me after trying this was that while the “good_picture_points” could be queried and sorted, bizarre_points and funny_points returned no Sphinx results. Looking into the output generated by thinking_sphinx:configure, I discovered why:

...
LEFT OUTER JOIN `merch_match_item_datas` ON merch_match_item_datas.item_id = items.id AND `merch_match_item_datas`.`data_tag_id` = 0   LEFT OUTER JOIN `merch_match_item_datas` mm_bizarres_items ON mm_bizarres_items.item_id = items.id AND `merch_match_item_datas`.`data_tag_id` = 2   LEFT OUTER JOIN `merch_match_item_datas` mm_good_values_items ON mm_good_values_items.item_id = items.id AND `merch_match_item_datas`.`data_tag_id` = 7   LEFT OUTER JOIN `merch_match_item_datas` mm_funnies_items ON mm_funnies_items.item_id = items.id AND `merch_match_item_datas`.`data_tag_id` = 4  
...

The problem was that, in determining the SQL to build, Thinking Sphinx uses the first association it comes across as the default set of conditions for all future joins to the table. So, in this case, anything that joined the merch_match_item_datas table was going to be joining that table with the data_tag_id = 0 condition of our first declared association (mm_good_picture_tag). That is, mm_bizarre now was looking for data_tag_id=0 and data_tag_id=[id of bizarre tag]. So, nothing was returned.

After a bit of head scratching, I came up with the following workaround for this:

has merch_match_data_tags.tag_count
has mm_good_picture.tag_count, :as => :good_picture_points
has mm_bizarre.tag_count, :as => :bizarre_points
has mm_good_value.tag_count, :as => :good_value_points

Basically, just make the first association that Thinking Sphinx encounter be an unqualified, unfiltered association to the merch_match_data_table. This ensures that the proper join structure is setup, so all of the subsequent has attributes function as they should.

Hope that I’m not the only one ever to find this useful.

Hint for Job Seekers: Wake Up and Write!

Over the last three years I’ve spent at least 6 months hiring, which equates to more than 1,000 applicants reviewed. But even before I had seen our 50th applicant, I was stunned by the applicant apathy that pervaded our job inbox. At first I figured it must be us. When we were initially hiring, it was for the opportunity to work for peanuts at an unproven web startup. Surely this must explain why 95% of the applications we received were a resume accompanied by a generic cover letter, or no cover letter at all.wakeup_job

But now that we have proven our business, with ample resources to bring aboard top tier talent, I am baffled at the scarcity of job seekers who understand the opportunity that the cover letter presents for them to stand out from the other 49 applications I’ll receive today.

Think about it, job seeker. Every day, my inbox is flooded with anywhere from 25-50 applicants. Each of these applicants sends a resume, and each of these resumes detail experience at a bunch of companies I haven’t heard of in job titles that can only hint at what the person might have really done on a day-to-day basis.

If you were me under these circumstances, how would you weed out the applicants that are most interesting? What would wake up you from the torrent of generic cover letters and byzantine job histories?

P-E-R-S-O-N-A-L-I-T-Y.

When I am not paying close attention, it feels like the same guy has been applying for our job repeatedly for months, each time with a slightly different form letter to accompany his or her list of jobs titles.

The applicants that wake me up from this march of sameness are those 5% that demonstrate they have actually taken the 5 minutes to understand what Bonanzle is, what about the company gets them excited, and why they would be a good fit relative to our objectives and specific job description. (IMPORTANT NOTE: Batch-replacing [company name] with “Bonanzle” does not qualify as personalizing)

Interestingly, the applicants for business-related positions we’ve posted in the past tend to do a comparatively phenomenal job at this. If only these business people had design, UI, or programming skills, they would immediately ascend to the top of our “To interview” list. But the actual creators — programmers, designers, and UI experts — just don’t seem to get it. I suppose it could be a chicken-and-egg situation, where the minority of them that do get it are swooped up immediately by companies that crave that glimpse of personality, and the rest of them keep blindly applying to every job on Craigslist without giving a damn.

The other sorely underrepresented aspect to a good application? Decent portfolios. If you’re a designer, take the slight interest I’ve already expressed toward resumes, and cut it in half. Your value is much easier to ascertain by what you’ve done than what you’ve said, and you have the perfect opportunity to show us what you’ve done by creating a modern, user friendly portfolio. On average, I’d estimate I see about one modern, well constructed portfolio of these for every 20 designers that apply. (Personal bias: Flash-based portfolio sites load slow and feel staid; I might be unique in that opinion though)

I see a huge opportunity to awaken and realize how little effort it would take to create an application that shines. You want to be a real overachiever? Why not spend 15 minutes to sign up for an account and browse the site, and incorporate that experience into your cover letter? Amongst more than 50 applicants for our first hire, Mark Dorsey, aka BonanzleMark aka the best hire I’ve made so far, was the SINGLE applicant that spent the 15 minutes required to do this. In more than 500 applications since, I have yet to see it again.

The world is rife with creative ways to get your application noticed. All it takes is 15-30 minutes of your time (including time to personalize the letter) to rise into the 90th percentile. If it’s a job you care about, you’re earning a potentially $100k salary for 30 minutes of work = about $3-4k per minute. I know lawyers that don’t even make that much.

Rails tests: One line to seriously pump up the speed

Alright, I admit it: I didn’t seriously write tests for Bonanzle until a couple months ago. But I had my reasons, and I think they were good ones.

Reason #1 was that I hated everything about fixtures. I hated creating them, I hated updating them every time we migrated, and I hated remembering which fixture corresponded to which record. Factory Girl was the panacea for this woe.

Reason #2 was that it took an eon to run even a single test. When trying to iterate tests and fixes, this meant that I ended up spending my time 10 parts waiting to one part coding. After much digging, I eventually determined that 90% of our test load time was attributable to caching all our classes in advance. Of course, my first inclination was just not not cache classes in our test environment, which actually worked reasonably well to speed tests the hell up, until I started writing integration tests, and found our models getting undefined and unusable over the course of multiple requests. Then, I found the answer:

config.eager_load_paths.clear

This line basically says, even if you set config.cache_classes = true, Rails should not try to pre-load all models (which, in our case is more than 100).

Adding this line allows us to cache classes in test (which fixes the integration test problems), while at the same time getting the benefits of a configuration that doesn’t take 2 minutes to load.

(Of course, also key was configuring our test rakefile such that we could run single tests, rather than being obligated to run the entire suite of tests at once. If anyone needs finds this post and doesn’t yet know how to invoke a single test, post a comment and I’ll get unlazy and post the code for that)

Get Session in Rails Integration Test

From the results Google gives on this, it seems that about three people in the world are using integration tests in Rails, and two of them stopped programming in 2007.

My goal: to get at session data from within an integration test.

Bad news: I don’t know any way to do this without first calling a controller action from within your integration test.

Good news: I have example code on how to get at it after making a request.

def add_to_cart_integration_test
  s = open_session
  s.post url_for(:controller => 'shopping_cart', :action => :add_item, :item_id => 1)
  s.session.data # a populated session hash
  s.flash # your flash data
end

And there you have it. Here is a more detailed examples that may be of use, derived from working integration tests in our codebase:

open_session do |s|
	item_1 = FactoryHelper::ItemFactory.create_sellable_item
	add_item_to_cart(item_1, s)
	user = Factory(:activated_user, {:password => USER_PASSWORD})
	login_user(user, USER_PASSWORD, s)

	s.post url_for(:controller => 'offers', :action => :cart_summary)
	all_offers = s.assigns(:all_offers_by_booth)
	assert_equal all_offers.size, 1
	the_offer = all_offers.first

	# Trial 1:  Simulate dragging item out of cart
	s.post url_for(:controller => 'lootbins', :action => :remove_from_lootbin_draggable, :item => item_1.id)

	s.post url_for(:controller => 'offers', :action => :cart_summary)
	all_offers = s.assigns(:all_offers_by_booth)
	assert all_offers.empty?
	assert !Offer.exists?(the_offer)

	# Trial 2:  Simulate removing item from cart on cart page
	add_item_to_cart(item_1, s)
	s.post url_for(:controller => 'offers', :action => :cart_summary)
	all_offers = s.assigns(:all_offers_by_booth)
	assert_equal all_offers.size, 1
	the_offer = all_offers.first

	s.put url_for(:controller => 'offers', :action => :remove_from_offer, :item_id => item_1.id, :id => the_offer.id)
	s.post url_for(:controller => 'offers', :action => :cart_summary)
	all_offers = s.assigns(:all_offers_by_booth)
	assert all_offers.empty?
	assert !Offer.exists?(the_offer)
end

And here are the helpers involved:

def add_item_to_cart(item, os)
	items_in_bin = begin
		lootbin = Lootbin.new(os.session.data) # equivalent to ApplicationController#my_lootbin
		lootbin.offer_items.size
	rescue
		0
	end
	os.post url_for(:controller => 'lootbins', :action => :add_to_lootbin_draggable, :item => item[:id])
	lootbin = Lootbin.new(os.session.data) # equivalent to ApplicationController#my_lootbin
	assert lootbin
	assert lootbin.has_items?
	assert_equal lootbin.offer_items.size, items_in_bin+1
end

def login_user(user, password, integration_session)
	integration_session.https!
	integration_session.post url_for(:controller => 'sessions', :action => :create, :username => user.user_name, :password => password)
		
	assert_equal integration_session.session.data[:user_id], user.id
end

Paypal Masspay Ruby Example

The title of this post is a Google query that yielded no good results, but plenty of puzzled users trying to figure out how to make it work. I’ve only been playing with it for about half an hour, but this code is getting Paypal to tell me that the request is successful:

def self.send_money(to_email, how_much_in_cents, options = {})
	credentials = {
	  "USER" => API_USERNAME,
	  "PWD" => API_PASSWORD,
	  "SIGNATURE" => API_SIGNATURE,
	}

	params = {
	  "METHOD" => "MassPay",
	  "CURRENCYCODE" => "USD",
	  "RECEIVERTYPE" => "EmailAddress",
	  "L_EMAIL0" => to_email,
	  "L_AMT0" => ((how_much_in_cents.to_i)/100.to_f).to_s,
	  "VERSION" => "51.0"
	}

	endpoint = RAILS_ENV == 'production' ? "https://api-3t.paypal.com" : "https://api-3t.sandbox.paypal.com"
	url = URI.parse(endpoint)
	http = Net::HTTP.new(url.host, url.port)
	http.use_ssl = true
	all_params = credentials.merge(params)
	stringified_params = all_params.collect { |tuple| "#{tuple.first}=#{CGI.escape(tuple.last)}" }.join("&")

	response = http.post("/nvp", stringified_params)
end

Certainly not the tersest solution, but I’ve kept it a bit verbose to make it clearer what’s happening.

One point of note is that you’ll need separate credentials when submitting to sandbox vs. production. You can sign up for a sandbox account by clicking “Sign up” from this page. After you have your account, login, click on “profile,” then get your API access credentials.

Here is the PHP example I based my code on and here is the Paypal Masspay NVP documentation that was marginally helpful in figuring out what params to pass.

Best of Rails GUI, Performance, and other Utilities

I’m all about putting order to “best of” and “worst of” lists, so why not give some brief props to the tools, plugins, and utilities that make life on Rails a wonderous thing to behold?

5. Phusion Passenger. OK, this would probably be first on the list, but it’s already been around so long that I think it’s officially time to start taking it for granted. But before we completely take it for granted, would anyone care to take a moment to remember what life was like in a world of round-robin balanced Mongrels web servers? You wouldn’t? Yah, me neither. But no matter how I try, I cannot expunge memories of repeatedly waking up to the site alarm at 7am to discover somebody had jammed up all the Mongrels with their stinking store update and now I’ve got to figure out some way to get them to stop.

4. jQuery / jRails. This probably deserves to score higher, as the difference between jQuery and Prototype is comparable to the difference between Rails and PHP. But since it’s not really Rails-specific, I’m going to slot it at four and give major props to John Resig for being such an attentive and meticulous creator. Without jQuery and jQuery UI, the entire web would be at least 1-2 years behind where it is in terms of interactivity, and I don’t think that’s hyperbole. (It even takes the non-stop frown off my face when I’m writing Javascript. With jQuery, it’s merely an intermittent frown mixed with ambivalence!)

3. Sphinx / Thinking Sphinx. There’s a reason that, within about six months time, Thinking Sphinx usurped the crown of “most used full text search” utility from “UltraSphinx.” And the reason is that it takes something (full text search) that is extremely complicated, and it makes it stupidly easy. And not just easy, but extremely flexible. Bonanzle has bent Sphinx (0.9.8) into doing acrobatics that I would have never guessed would be possible, like updating it in nearly-real time as users log in and log out. Not to mention the fact it can search full text data from 4 million records in scant milliseconds.

Sphinx itself is a great tool, too, though if I were going to be greedy I would wish that 0.9.9 didn’t reduce performance over 0.9.8 by around 50% in our testing, and I would wish that it got updated more often than once or twice per year. But in the absence of credible competition, it’s a great search solution, and rock solidly stable.

2. New Relic. OK, I’ll admit that I’ve had my ups and downs with New Relic, and with the amount of time I’ve spent complaining to their team about the UI changes from v1 to v2, they probably have no idea that it still ranks second in my list, ahead of Sphinx, as most terrific Rails tools. But it does, because, like all the members of this list, 1) the “next best” choice is so far back that it might as well not exist (parsing logs with pl-analyze? Crude and barely useful. Scout? Nice creator, but the product is still a tot. Fiveruns? Oh wait, Fiveruns doesn’t exist anymore. Thank goodness) and 2) it is perhaps the most essential tool for running a production-quality Rails site. Every time I visit an ASP site and get the infinite spinner of doom when I submit a form, I think to myself, “they must not know that every time I submit a form it takes 60 seconds to return. That would suck.” On a daily basis, I probably only use 10% of the functionality in New Relic, but without that 10%, the time I’d spend tracking logs and calculating metrics would make my life unfathomably less fun.

1. Rubymine. The team that created this product is insane. Every time that I hit CTRL-O and I type in “or/n” and it pops up all files in the “offers_resolution” folder starting with the letter “n,” I know they are insane, because having that much attention to productivity is beyond what sane developers do. Again, for context’s sake, one has to consider the “next best” choice, which, for Linux (or Windows) is arguably a plain text editor (unless you don’t mind waiting for Eclipse to load and crash a few times per day). But, instead of programming like cavemen, we have a tool that provides killer function/file lookup; impeccable code highlighting and error detection (I had missed that, working in a non-compiled language); a working visual debugger; and, oh yeah, a better Git GUI than any of five standalone tools that were built specifically to be Git GUIs.

Perhaps as importantly as what Rubymine does is what it doesn’t do. It barely ever slows down, it doesn’t make me manage/update my project (automatically detecting new files and automatically adding new files to Git when I create them from within Rubymine), and it handles tabs/spaces like a nimble sorcerer (something that proved to be exceedingly rare in my quest for a usable IDE).

Like New Relic, I probably end up using only a fraction of the features it has available, but I simply can’t think of anything short of writing my code for me that Rubymine could do that it doesn’t already handle like a champ. Two thumbs up with a mutated third thumb up if I had one.

Conclusion

Yes, it is a list of apples and oranges, but the commonality is that all five-ish of the lists members stand apart from the “second best” solution in their domain by a factor of more than 2x. All of them make me feel powerful when I use them. And all of them, except arguably New Relic, are free or bargain priced. Hooray for life after Microsoft. Oh how doomed are we all.

Rails Unified Application Logging with Log Weaver

The Problem

After adding our third app server a couple days ago, the appeal of digging through three separate production.log files when things go awry on Bonanzle was officially over.

Like many Rails developers in this situation, I Googled numerous terms in search of a solution, and most of these terms sent me to my good friend Jesse Proudman’s blog on using Syslog-ng to unify Rails logfiles. So we installed it (well, to be specific, we had Blue Box set it up for us, because it looked complicated), and determined that it was not what we were looking for. Installation issues aside (of which there were a few), the real killer when using Syslogger with Rails is that you lose the buffered production log output you have come to know and love, leaving your production logfiles a stew of mishmashed lines from numerous Passenger process in numerous states of processing. In short, if you get an appreciable amount of traffic (and I’d imagine you do if you’re reading this in the first place), and you are a human being, you will not be able to read an unbuffered Rails log without considerable time and frustration.

The Solution

Exists on Github here.

Since it looked like Syslog was the only game in town currently for merging application server logs, I decided to spend the afternoon writing a plugin that would allow us to take an arbitrary number of production logfiles, from an arbitrary number of hosts, and merge them together into one file without changing the formatting of the production logs or affecting performance on the app servers.

The basic mechanism of this plugin is that it uses rsync to grab your production logs, then it boils those production logs down into hashes of { :time => action_time, :text => action text }. It then outputs the actions from all of your app servers into a single file in chronological order.

As a bonus, it also lets you specify the maximum size of your unified log file, and handles keeping the logfiles broken into bite-sized chunks, so that you can actually read the output afterwards (rather than ending up with a 5GB log file). This functionality is built in, and can be configured via an included YML file.

The remainder of this post will just quote from the Github project, which does a pretty fine job of explaining what’s going on:

Log Weaver Functionality
========================
Log Weaver v0.2 sports the following featureset:

* Sync as many production logfiles, from as many hosts as you want, into a single logfile on a single server
* Use your existing Rails production.log, no need to modify how it’s formatted
* Break up your unified log file into bite sized chunks that you specify from a YML file, so you don’t end up with a 10 GB unified logfile that you can’t open or use
* Does not run on or adversely affect performance of app servers. Uses rsync to grab production log files from app servers, then does the “hard work” of combining them on what is presumably a separate server.

Installation
============
Clone the log-weaver github project into your vendor/plugins directory. No models, database tables, or installation is needed for this plugin. Simply edit the /log-weaver/config/system_logs.yml file to specify the settings of your hosts.

Usage
=====
Run “rake log_weaver:weave_logs” to initiate the process of log merging.

When you run this task, Log Weaver will rsync your logfiles from the locations you specified in the YML file. The first time you run Log Weaver, this might take a minute or two, depending on how big your production logs are. On subsequent runs, it should be pretty instanteous, since rsync is smart about merging files.

After rsyncing your production logs from your app servers, Log Weaver will build the production logs into bite-sized hashes and merge them in chronological order into the file that you specified in your YML file. Limited testing has shown that combining three logfiles that were each 1+ GB used less than 300MB of memory and completed in about 10 minutes.

Run “rake log_weaver:weave_logs” periodically to add to your unified log. When the size of your unified log exceeds the size you specified in the settings YML file, the unified log will be renamed “[log_name]_X.log” where X is the lowest integer of a file that doesn’t exist in your log directory. That is, if you named your file “unified.log,” Log Weaver would move the original log to “unified_2.log” and then open a new “unified.log” file to continue merging your logs.

Future Improvements
===================
Log Weaver was written over the course of a few hours to fit the baseline needs of Bonanzle, there is surely plenty of room to improve! For starters, this would probably make more sense as a gem than a Rails plugin.

Feel free to fork and add whatever you think it needs and ping me to pull in your improvements and we can make this plugin a worthwhile thing.

Sphinx 0.9.9 Review, A Cautionary Tale

After my previous raves about Sphinx in general and Thinking Sphinx in particular, I was excited to get my hands on the new Sphinx 0.9.9 release that was finally made available at the beginning of December via the Sphinx Search site.

Given that our Sphinx usage is what I think would fall the “advanced cases” heading, I expected probably a day or two of upgrade headaches before we’d be back on track. Worth it, said I, for the potential to get working index merge, which could set the stage for indexes that happen more often than once every four hours (our current index takes about 3 hours, plus time to transfer files between the building Sphinx machine and the search daemon machines).

Alas, our upgrade did not go according to plan.

This Monkey Patches Going to Heaven

Given how prompt Pat Allen (creator of Thinking Sphinx) has been in addressing and fixing bugs in the past, I don’t doubt that many of our upgrade headaches from the TS side will probably be fixed soon (if not already, since I emailed him most of our issues). That said, we required about five monkey patches to get the most recent version of TS with 0.9.9 working the same as our previous TS with 0.9.8 did. The patches ranged from patching the total_entries method (if search can’t be completed it throws exception) to real time updates not working (via client#update), to searches that used passed a string to TS where it expected an int throwing an exception.

This does not include “expected” differences, such as the fact that search is now lazily evaluated, so if you previously wrapped your search statements in a begin-rescue block to catch possible errors, your paradigm needs to shift.

It also appears that the after_commit plugin bundled with TS 0.9.8 has been modified such that it does not available to models in our project by default. Never figured out a fix for that bug, since by the time I noticed it, I had also become aware of an even bigger 0.9.9 detriment: overall performance. Reviewing our New Relic stats since we updated to 0.9.9, we have found an across-the-board decrease of about 50% to our Sphinx calls. I parsed the Sphinx logs to try to ascertain if the slowness was spawning from Sphinx or TS, and it appears to be Sphinx as the main culprit.

Performance

TS 0.9.8

Considering 290227 searches.
Average time is 0.0318751770166445.
Median time is 0.005.
Top 10% average is 0.193613017710702 across 29022 queries

TS 0.9.9

Considering 843569 searches.
Average time is 0.0430074540435297.
Median time is 0.006.
Top 10% average is 0.286621461425376 across 84356 queries

Many of our queries take 0.00 or 0.01, so the median doesn’t look too much different between the two, but the average time (which is what New Relic picks up on) is 35% slower in Sphinx alone, and about 50% slower once all is said and done. An action on our site that does a Sphinx search for similar items (and nothing else) consistently averaged 200 ms for weeks before our upgrade, and has averaged almost exactly 300 ms for the week since the upgrade.

Conclusion: Proceed with Caution

Would be nice if I had more time to debug why this slowness has come about, but the bottom line for us is that, after spending about 3 days patching TS to get it to work in the first place, and with the “after_commit” anomaly still on our plate (not to mention overall memory usage increasing by about 20%), I have ultimately decided to return to TS 0.9.8 until such time that a release of Sphinx is made available that speaks directly to its performance compared to previous versions. I think the Sphinx team is doing a great job, but amongst juggling the numerous new features, it seems that performance testing relative to 0.9.8 didn’t make the final cut?

Or there could always be some terrible misconfiguration on our part. But given that we changed our configuration as little as possible in moving from 0.9.8->0.9.9, if we are screwing up, I would say it is for perfectly understandable reasons.

A three day window of a pure search action. First two days with TS 0.9.9 average 300, yesterday after reverting back to 0.9.8 about 200 ms
A three day window of a pure search action in our app. First two days with TS 0.9.9 average 300 ms, yesterday after reverting back to 0.9.8 about 200 ms