Continuing my series of reviews of the plugins and products that have made Bonanzle great, today I’ll talk about Thinking Sphinx: How we’ve used it, what it’s done, and why it’s a dandy.
What It Is Bro, What It Is
What it is is a full text search Rails plugin that uses the Sphinx search engine to allow you to search big tables for data that would take a long-assed time (and a lot of custom application code) to find if you used MySql full text searching.
What Are Your Other Options
In the space of legitimate Rails full text plugins, the commonly mentioned choices are Sphinx (via Thinking Sphinx or Ultra Sphinx), Xapian (via acts_as_xapian), solr (via acts_as_solr) and (shudder) Ferret (via acts_as_ferret).
Jim Mulholland does a great job of covering the various choices at a glance, so if you’d like a good overview, starts with his blog about the choices.
To his commentary, I would add that Solr looks complicated to get running, appears to have been abandoned by its creator, and hasn’t been updated in quite awhile. It should also be mentioned that if you were to choose solr, every time you wished to talk about it online you’d have the burdensome task of backspacing the “a” out of the name that your fingers were intent on typing in.
Xapian seems alright, but the documentation on it seemed lacking and not a little arcane. Despite Jim’s post on how to use it, it seemed like the Xapian Rails community was pretty sparse. My impression was that if it didn’t “just work,” it would be I alone who would have to figure out why. Also, from what I can tell in Jim’s post, it sounds like one has to stop Xapian from serving search requests to run the index? Update: the FUD patrol informs me that you can concurrently index and serve, oh what joy!
Ferret sucks. We tried it in our early days. It caused mysterious indexing exceptions left and right, whenever we changed our models or migrated. The day we expunged it from our system was the day I started programming our site and stopped worrying about what broke Ferret today.
Ultra Sphinx looks OK, but as you can read here, it’s ease of use leaves something to be desired compared to the star of our blog post, who has now entered the building. Ladies and gentlemen, may I present to you, hailing from Australia and weighing in at many thousand lines of code:
Thinking Sphinx!
There’s a lot to like about Thinking Sphinx: it has easy to read docs with examples, it has an extremely active Google Group behind it, and it supports useful features like location-based searches and delta indexing (e.g., search updates in real time).
But if there is one reason that I would recommend Thinking Sphinx above your other choices, it’s that you probably don’t care a hell of a lot about full text searching. Because I didn’t. I care about writing my website. This is where Thinking Sphinx really shines. With the tutorials and Railscasts that exist for Thinking Sphinx, you can write an index for your model and actually be serving results from Thinking Sphinx within a couple hours time. That doesn’t mean its an oversimplified app though. Its feature list is long (most of the features we don’t yet use), but smarts defaults are assumed, and its super easy to get rolling with a basic setup, allowing you to hone the parameters of the search as your situation dictates.
Also extremely important in choosing a full text search system is reliability. Programming a full text engine (and its interface into your application) is rocket science, as far as I’m concerned. I don’t want to spend my time interpreting esoteric error messages from my full text search engine. It must work. Consistently. All the time. Without me knowing anything about it. Thinking Sphinx has done just that for us. In more than a month since we started using it, it’s been a solid, reliable champ.
A final, if somewhat lesser, consideration in my recommendation of TS is who you’ll be dealing with if something goes wrong. Being open source, my usual expectation is that if me and Google can’t solve a particular problem, it will be a long wait for a response from a random, ever-so-slightly-more-experienced-than-me user of the system in question who will hopefully, eventually answer my question in a forum. Thinking Sphinxes creator Pat Allen blows away this expectation by tirelessly answering almost all questions on Thinking Sphinx in its Google Group. From what I can tell, he does this practically every night. This is a man possessed. I don’t claim to know or understand what’s in the punch he’s drinking (probably not beginner’s enthusiasm, since TS has been around for quite some time now), but whatever’s driving him, I would recommend you take advantage of his expertise soon — before he becomes jaded and sour like the rest of us.
What About the Performance and Results?
Performance: great. Our usual TS query is returned in a fraction of a second from a table of more than 200,000 rows indexed on numerous attributes. Indexing the table currently takes about 1-2 minutes and doesn’t lock the database. Nevertheless, we recently moved our indexing to a remote server, since it did bog down the system somewhat to have it constantly running. I plan to describe in the next couple days how we got the remote indexing working, but suffice to say, it wasn’t very hard (especially with Pat’s guidance on the Google Group).
Results: fine. I don’t know what the pertinent metrics are here, but you can use weighting for your results and search on any number of criteria. Our users are happy enough with the search results they’re getting with TS out of the box, and when we do go to get more customized with our search weighting, I have little doubt that TS will be up to the task, and it’ll probably be easy to setup.
Final Analysis
If you want to do full text searching on a Rails model, do yourself a favor and join the bandwagon enjoying Thinking Sphinx. It’s one of the best written and supported plugin/systems I’ve stumbled across so far in the creation of Bonanzle.
I’m Bill Harding, and I approved of this message.
Bill, I can hear your thoughts on this page vibrating through the earth’s crust. Your genius and humor are what make you tick!
(laughing) – cooler than a polar bear’s toenails
FUD patrol…
“it sounds like one has to stop Xapian from serving search requests to run the index? Yuck.”
That’s simply not the case – Xapian supports concurrent indexing and searching.
@olly: Thanks for the good news. I’ve added it to the main post.
Cool, thanks Bill.
Thanks. Your article has helped us a lot too đŸ™‚
Your article is spot on.
Thanks!
More FUD – the article states: “Solr looks complicated to get running, appears to have been abandoned by its creator, and hasn €™t been updated in quite awhile.”
None of those are true – though most Solr development has moved from CNet to a VC-backed company focused on developing it.
@rmxz, I see that Solr itself is still being developed, but what I’d meant more specifically in this post was that Solr for Rails was stale & abandoned. At least, that’s the impression I get from the first Google result when I look for “Rails Solr” and when I look at the Github project (http://github.com/railsfreaks/acts_as_solr) which states that the last update was in June 2007.