The Stronghold of Bioinformatics

Sat, Jan 5, 2013

No one likes gamification or MOOCs, as far as I can tell. What I should say is that anyone trained in the hermeneutics of suspicion might even find it hard to accept their existence. It’s hard to come up with a hypothetical concept that would cry more piteously to the heavens for critique, for example. True to form, until a few weeks ago I had never earned a badge in my life and would have regarded the prospect of doing so with contempt and a touch of pity for whoever was naive enough to suggest it.

Then, there was this Metafilter post. Things I’ve discovered via Metafilter have taken away many months of work-time over the years, so the sensible thing to do would be to quit reading it. But that’s unlikely. In any case, Project Rosalind is a series of programming problems related to bioinformatics. It has the gamified features of “levels,” “badges,” “achievements,” and even, God help me, “xp.” There are a series of problems related to string processing, probability, and other topics. They have a tree-like structure, and you have to solve precursor problems before getting access to the later ones. Solving a problem involves downloading a dataset and submitting a solution within five minutes. After you’ve solved the problem, you can see the code that others have posted to solve the problem.

This feature is particularly interesting to me, as I have never really learned functional programming, so when I see solutions to problems that I have solved in perl in languages such as Haskell, Clojure, or Scala, it’s a bit easier to understand how they were put together. (Rosetta Code is another place to see programming problems solved in multiple languages.) You are allowed unlimited attempts to get the right answer, and you can see forum questions about the problem after two unsuccessful tries. (I have posted a question once–a rather idiotic question in retrospect–and I received a correspondingly withering response, whose impact I mitigated somewhat by imagining it spoken in the Comic Book Guy’s voice.)

I have, at this point, solved twenty-two of the ninety-three problems. The early ones are trivial, but I’m finding the difficulty to be scaling up quite a bit. I’ve used some algorithms I had never worked with before, such as tree-suffix and shortest-superstring. I’ve also used arbitrarily nested loops in perl (with Algorithm::Loops) and contemplated the theoretical limits of what a regular expression can match more than I’ve had to before. It’s also quite interesting to see what the total numbers of problems solved reveal about people’s background knowledge. Two of the problems involving Mendelian inheritance and probability have been solved proportionally many fewer times than (more difficult) string-processing programs. (I don’t mean to be a hypocrite in saying this, as I got tired of the Punnett-squares required in the second one of those and haven’t solved it myself.)

Some of the gamified features of the site I regard as silly (levels, xp, badges, achievements), but I admit that I can’t help but be motivated by the statistical information about how many people have solved which problems. It triggers my instinctual competitiveness, somehow. They even seem to encourage people to post their country of origin to introduce nationalism into the competitive mix here. As a learning tool, I’m not sure how effective it is. It’s quite possible to solve many of the problems while retaining only the barest minimum about the underlying molecular biology, and problems which require a bit more conceptual understanding than that (see the Mendelian inheritance ones above) are comparatively ignored.

The programmatic checking of solutions is also somewhat finicky. An end-of-line character at the end of the file will cause an otherwise correct solution to fail for at least some of the problems, for example. But all in all, I’m very impressed with this site and think it has a lot of potential in teaching people (humanists, for example), how to program. It would be nice to be able to reuse the code with different problem sets, if they ever decide to release the source in the future.

UPDATE:

I corrected a few mistakes (I gave myself an extra problem, for instance), and I also wanted to mention an important precursor: Project Euler. This site has mathematics problems, and it also seems a bit more streamlined. I haven’t actually used it yet, though.