Won in Translation, by David Jacobs

Clojure: All grown up

2013-03-05T00:00:00+00:00

I want to convince you of one thing: You should take a look at Clojure. It will simplify your coding life, speed up product development. It will clarify how you think about structure and complexity. And–if you like avoiding unnecessary frustration and boilerplate–it will make you happy. So reconsider the plans you had to use Rails or Django or Play to build your next business, and don’t look back. Clojure is ready for prime time.

I’m serious.

Clojure is the most pleasant language I’ve ever worked with, and that’s after 8+ years of Ruby. It brings a different paradigm to the table, changes the way you think about code. In fact, telling you everything I’ve learned from Clojure would take a book and then some, and I can’t possibly tell it all here. But I do hope to at least whet your appetite.

What is Clojure, and why should I use it?

Clojure is one of the newer kids on the block, and I like to think it’s shaking things up. Though the language rests on top of the JVM, a benevolent host, it looks and feels nothing like Java. Clojure takes advantage of the years of work that have gone into optimizing the JVM. But it kicks up the level of abstraction you can work in by about 10. (That’s 10 arbitrary units, if you’re keeping score at home.)

At a high level, Clojure makes life simpler and clearer–whether you’re building a Web API, a machine learning algorithm, or the ultimate music synthesizer. Let me tell you about some of my favorite parts.

Functional programming is standard. Functional code tends to be clean and easy to reason about. Instead of thinking in low-level constructs (like imperative for loops), you think in terms of what you want to get done. You write it in a natural way, and the language optimizes for you. You can use mutable state if you think it’s absolutely necessary. But most of the time, you’ll want to contain that in just a few places, and Clojure encourages you to do that. Studies have shown that reading this last sentences can cause light-headedness, fever, and feelings of mild indigestion. Nevertheless, once you move to a mostly immutable world, you’ll wonder how you ever survived with so much state floating around, running into other bits of state–just like you probably wonder why you ever put up with pointer arithmetic and memory deallocation.
Clojure is built for the real world. Clojure likes–but doesn’t enforce–functional purity. For the cases where it’s necessary to model data using state (see: databases), Clojure gives you safe ways to handle it. And get this: You don’t need a PhD in Monads. (Huzzah!) Now, don’t get me wrong–I love Haskell. But as an engineer, I sometimes feel like Clojure is the forgiving parent, and I’m the semi-responsible teenager. It will try to help me live a good life when it can … but when I sneak into the beer cabinet, it looks the other way. (After drinking it, by the way, I immediately feel ashamed. Because really … Smirnoff Ice?)
Clojure redefines the fundamental units of code. Instead of thinking in terms of loop boilerplate, guard clauses, “what if this isn’t initialized yet?”, off-by-one errors, design patterns, serialization, or semicolons, you get to think about data transformations. About defaults that are ultimately flexible. Rubyists know that their language effectively got rid of low-level for loops. In the same way, Clojure gets rid of imperative iteration in favor of declaration. Your thoughts shift away from place-oriented ideas like memory addresses and gravitate to data structures and functions like map, reduce and filter. Your “class hierarchy” turns out to be a type system that happens to also lock away well-meaning functions into dark dungeons (more on that in another article), and getting away from that is freeing. Transitioning to an open, abstract world means the intention of your code isn’t obscured by artifacts of computing, by the fact that your code is using and reusing finite memory. Your code is modeled as data, too, by the way. That means you can manipulate it using the same functions you use to manipulate strings and lists and maps. This shift dramatically simplifies and levels-up the tools available for solving new problems.
Clojure code tends to be insanely beautiful. More importantly, though, beautiful code happens to be both readable and efficient (after you learn to love or ignore the parentheses). (If you’re skeptical, you should read up on persistent data structures.) Speaking of beauty, in Clojure, the physical shape of your code is a nice indicator of how clean it is. For example, when you use side effects in code (like writing to disk, logging, throwing exceptions), it’s becomes pretty obvious from indentation and ! s in method names. And an excess of parentheses at the end of a function call tells you that you may want to break it up into smaller functions. So when you’re looking to clean up code, a quick visual scan can often tell you where to find candidates for refactoring.
Clojure is primed for parallelism. Functional code is easy to reason about, regardless of how many threads you’re working with. For non-functional concerns like database manipulation, Clojure has brought several innovative concepts from academia into the mainstream. (In fact, several of these have seen adoption in other languages, too.) For example, you have Clojure’s software transactional memory (STM) at your disposal–it will help you coordinate state change in a wide range of scenarios. So when you do have your Twitter moment and need to scale, you have loads of options to do so, from lightweight to industrial. You won’t have to change stacks or even the way you think about code. Maybe you’ll convert a (map ...) call to (pmap ...) to make it parallel. Or maybe you’ll decide to use agents or refs coordinated by STM. What you won’t be doing is spending weeks scratching your head over semaphores and mutexes.
You can forget context. Why? Because there is no context to learn. Most of your code takes input and produces output, and the pieces of code that don’t follow that pattern tend to stick out. On top of that, each file states its own dependencies, so you’re never left guessing where that strange symbol came from. (“Did one of my libraries monkey-patch my code? Is that a local var or a method name? Did I inherit that, or is it in a mixin?”) Given this explicitness, you can move between abstraction layers when its appropriate, but you’re rarely forced to. Each layer also stands on its own. This means it’s relatively easy to ramp up new engineers on one part of your codebase at a time. Lack of context also means that any change you might want to make to a library is as easy as wrapping a function from the library you’re importing. Never again will you have to bite your nails as you monkey-patch a monkey-patch to try to tweak that opinionated library ever so slightly to meet your needs. Just swap out a context-free function!
The community values simplicity. Rich Hickey, Clojure’s inventor, is focused on disentangling unrelated concepts. He talks a lot about the benefits that gives to a codebase one, two, even five years out. So naturally, simplicity is one of the community’s key values. You’ll often hear the term “complect” when you’re talking with Clojure engineers. The word is an ancient one and was revived when Rich used it to describe code that ties unrelated concerns together. (His examples included the class construct, stateful variables, and conditional statements.) Because Clojurists are bent on decomplecting, most libraries compose with each other. This means you can build up a dependency tree that matches your domain requirements instead of basing that decision on which libraries are meant to play nice with each other. Switching out a templating engine or database layer should not be hard at all, because at the end of the day, if a library’s API has been designed correctly, functions are the actors, the heroes. And what’s more composable than a function?

Clojure has a lot more going for it too. Maybe I’ll be able to tell you more about its Java interop, concurrency primitives and macro systems some other time. For now, rest assured: this language is awesome.

What’s it missing?

Now, every language decision is a trade-off, and Clojure has its own set of trade-offs, too. While at the language level, I think most people like the trade-offs that Clojure makes, some common concerns are:

Error messages . While Clojure is beautifully designed, it still rests on the JVM and has some pretty ugly stack traces and error messages. This is steadily improving over time, but hopefully this will become a top priority soon.
Debugging . Clojure has never had a great debugger. Some say that once you are comfortable with Clojure, you won’t need one. (They recommend you use a REPL to execute functions that you’ve constructed over predictable data.) However, in my experience, most engineers are used to having full-context debuggers at their disposal, and I don’t blame them.
Startup time . This isn’t a big deal except when writing one-off scripts, but the JVM takes a good second or two to launch, which means that writing instant scripts or CLI tools in Clojure isn’t a great choice right now.
Ecosystem . The Clojure ecosystem is vibrant and evolving. However, given that it’s only a few years old, there is still a gap between what we have now and what Rails and Django offer.

All of these are valid concerns, but in my experience, they’re the kind of thing that you quickly learn to work with and are worth it to have such a powerful language and set of libraries (JVM) at your disposal. Of course, your team will have to weigh these trade-offs for yourself.

Why now?

So the question is, why is now the right time to build your product using Clojure? You were planning on building on Rails or Django. “Can I even hire for Clojure?”, you might be thinking.

These are fair questions. Clojure has been around for over five years now, and I have only felt like I could recommend it in the last couple of years. For a while, the tooling situation was iffy, the language was in flux, and the community had not solidified. But all of that has changed:

In Chas Emerick’s annual State of Clojure survey, it became obvious that Clojure adoption is happening. Plenty of people love Clojure and are looking for a job. More are trying it out and sticking around.
Community projects like ClojureDocs, Clojure Toolbox and ClojureWerkz make library discovery easy
Leiningen and Cake joined forces to become an all-powerful build tool. Then Leiningen reached version 2. (And let me tell you, Leiningen 2 alone makes Clojure worth using.) On the one hand, Leiningen makes it possible to reap the benefits of Maven without knowing anything about it. On the other, its tasks system will automate anything you want using Clojure.
Vim and Sublime are as viable as Emacs for Clojure coding.
Midje fixes problems with clojure.test and makes top-down testing possible and fun.
Datomic solves the versioned database problem everyone seems to be facing right now in today’s “Big Data” world.
Immutant lets you deploy your Clojure Web app to a JBoss server and take advantage of JBoss’s scalability without any XML configuration. You get a mature enterprise-ready Java server without the pain of Java or of configuration. Feels almost like cheating, doesn’t it?

If it matters to you, by the way, Clojure is officially in Thoughtworks’ technology adoption ring, which means that the firm recommends it for production use right now.

Oh, and big names have already adopted Clojure. There is a running list of companies using Clojure, and it includes companies like the Climate Corporation, Akamai, Flightcaster and BackType (now owned by Twitter). Prismatic (née Woven) is a Clojure team, and according to Quora, Amazon is on board with the language, too.

(For one other take-on-the-world company that’s decided to hop on the Clojure train, check out the end of this article.)

What you’re thinking

Okay, so you’ve just heard the high-level benefits of Clojure, and you know that successful companies are using it right now. But you’re skeptical. You might be thinking:

I don’t need to analyze lots of data; I don’t need to scale yet

Clojure is not a language dedicated to math or big data. While it happens to be better at dealing with math and physics and machine learning better than most other languages (especially the imperative ones), it is a general purpose language. It’s does everything that Ruby does in a scalable way.

So even though scalability and mathematical prowess are things Clojure excels at, don’t think of it as “a language for number crunching and statistics”. The real benefit of Clojure is its tendency towards functional purity. The way it makes you want to get rid of mutable state, or at least contain it in a box. The way it gets you to think of almost all code, even frontend code, in terms of data flow. The higher-level fundamental units it adds to your toolbelt. All of these let you squash bugs before they happen, create parallel code without agonizing over locks, and ramp up developers right away on a new codebase.

Functional programming is a fad

Another objection I’ve heard is that functional programming is a fad and isn’t a way to build a long-term business.

Functional programming may be having a resurgence, but it is not new and it is not a fad. It has been around at least since the days of Scheme and is a proven model, having more than 60 years of research behind it. Being so well thought out, the functional philosophy tends to irreversibly change your brain, change the way you think about data flow. It’s one of those things you adopt and never want to leave. And given the increased need for concurrency, it’s unlikely that the functional style will go away any time in the near future.

So if you’re thinking to yourself, “I don’t want to use the latest language or library, I want old, boring technology”, remember that you’re dealing with the confluence of the oldest (Lisp) and most boring (Java) technologies available.

The ecosystem is small

This is one objection I’ve heard, and I do think there’s merit to it. Though I think we’re past the early adopter phase of the adoption cycle, the Clojure ecosystem isn’t big enough to have worked out all possible kinks or problems that you might find. If your team doesn’t have a tolerance for solving pain points on their own (for example, training your engineers on this new way of thinking or spending time to make editing code awesome), Clojure might not be the best fit. However, if you’re willing to eat the up-front cost, I can tell you that what you find on the other side will almost certainly be worth it. (And finding people who are passionate about Clojure shouldn’t be hard, either.)

Starting down the rabbit hole

Even though this article is far from comprehensive, I hope I’ve coaxed you into thinking that you would be a happier, more productive engineer if you were using Clojure. Or at least that you should find out more.

If you do want to know more about the philosophy of Clojure and simplicity, a good place to start is with two of Rich Hickey’s best talks (in my opinion), Simple Made Easy and The Value of Values. No one explains these concepts like Rich. Moving on from there, I’d recommend reading one of three books. They are, affiliate links included, the following (feel free to remove the affiliate code):

Clojure Programming: This is a comprehensive book on Clojure. It starts with the basics, including rationale but does not stop there. If you want a one-stop book that explains everything from the reduce function functional coding to database connections and Web programming, this is the book for you. It also happens to be the most recently published book, so it’s got the most current information. Also, like all O’Reilly books, it has great typography. (Get the printed version!)
Clojure in Action: Clojure in Action is a great book about how to use Clojure day-to-day, for example, how to interface with Postgres, RabbitMQ and Map/Reduce.
The Joy of Clojure The Joy of Clojure:The Joy of Clojure is my favorite book on Clojure. It is deeply philosophical but also approachable. It covers the most advanced material of all three. Focuses less on the practical than on the theoretical, this book aims to explore what it means to code in Clojure in the twenty-first century. I highly recommended it, but maybe not as your introduction to the language.

While you’re reading the book, you should check out the mailing list and follow as many Clojure contributors as possible. Maybe subscribe to the Clojure Gazette or def newsletter or follow Planet Clojure on Twitter.

Finally, feel free to stop by here once in a while to say hey. I’m hoping to pick up writing more about functional experiments and code architecture now that I have more time to think about these things, and I hope to see you again … which brings me to my last point …

One more thing

At Minerva, we have decided to implement our platform in Clojure. So saying “you should use Clojure” isn’t something I’m saying lightly. Rather than being a hinderance to hiring, we’ve seen it boost interest from people who were otherwise not looking for new opportunities. (Psst … This is how you fight the big cos. for talent.)

We’ve been using a Clojure stack for our production code, and have seen that the codebase is already cleaner, leaner, and more extensible than any we’ve worked with in the past. Being able to tailor abstractions to our domain means that we can move quickly and focus on features instead of low-level thinking. If you’re interested in finding out more, I’d love to talk to you. And make sure to say hi at Clojure/West!

My path from medicine into technology

2011-06-28T00:00:00+00:00

The draw of technology

Until 2010, I planned to be a doctor, but now I’m happily building software. Why?

I should start off by telling you: Tech is not new to me. Computers have been a hobby for a long time–since I was 10 years old, actually. Back then, it was BASIC on a VTech learning laptop. I remember my sheer excitement when I learned how to tell my computer what to do, and that I could use get it to ‘remember’ things for me. When I was in high school, it was Java and C++. Bloated, overdone, maybe? But useful. Now it’s mostly modern and agile languages like Ruby and Clojure—and Haskell, just for fun.

In school, I toyed with the idea of electrical engineering, but ended up choosing medicine. I wasn’t sure then how electrons could help people. With medicine, it was obvious.

Hard work, I can deal with that

I’ve been ready for the challenge and reward of medicine since college. With a little hard work, I knew I could make discoveries and then, after med school, apply that experience to help people directly. The journey would be long, but at the end, worth it.

And I did, in fact, discover exciting things while doing research. At school and then at Massachusetts General Hospital, I learned about how pathogens likeF. tularensis ,V. cholera ,M. tuberculosis , andS. typhi worked, firsthand and even got to publish. And those were exciting times.

But being surrounded by the medical community, it turns out, gave me a glimpse into my potential future, and it’s one that’s hard to stay excited about.

Being surrounded by friends in residency, I keep hearing:

Medicine isn’t really about spending quality with patients anymore
Your personal relationships with your patients are there, but only into 15-minute increments
As a doctor, you have to worry more and more about the bottom line

Now, the reason I got into the field was to connect with people and help them, hearing this made me reconsider the tradeoffs. So between tech and medicine, what made me happy?

Technology and me

Being on the cutting edge of tech, pulling the power out of a computer; that’s what fascinates me. A microchip works faster that I can really understand, gives me reliable results. Will rinse-wash-repeat for me ten million times without complaining. Harnesses electron flow and remixes it into DNA sequences and military plans and the Social Web.

Having a computer in front of me is like having a tiny gnome ready and willing to do anything, no matter how mind-blowing, as long as I know its language. (In fact, just this month I discovered the James project on Github–it’s fun to turn your Mac into a personal butler.) A computer will crunch numbers all day. Or fetch and sort the day’s news in the morning. Or show me a video of almost anything I can type into a search box. (Enlightening Task of the Day: search for ‘magic power ball’ on Youtube.) Or download ~~movies~~ Linux images all day. Or tell me how dynamic my writing is. It doesn’t get bored, and it doesn’t fall asleep.

A computer, for me, is a second brain. And a third and fourth. It lets me get a lot more done than I can alone, work closer to perfect, gives me free time to keep up with the pace of progress. And this year, technology has changed me. Where before, I enjoyed tech as a hobby, I now feel confident about making it into a career. My skills have snowballed, and I’ve reached a turning point. I’m officially more interested in helping people by, say, designing electronic medical records than with a stethoscope.

Reconsidering medicine

So why does the Web trump med school? Don’t I want to change the world?

Well, the answer to that question isn’t easy. In a way, I’m still in love with medicine. Being on the cutting edge of biology—and especially infectious disease—thrills me. And being in an industry that focuses on curing people, with a huge focus the poor, has been rewarding. I’ll miss that. So medicine will always be a passion, in one way or another, and I won’t lose the drive to help my fellow human being.

But I’m not going to be a physician. I don’t deny that life could’ve been nice: There’s honor in the profession—and a lot of meaning. Every day brings, to some extent, a fresh challenge, an exotic disease or emergency you haven’t seen before. The balance between art and science and humanity is nothing short of beautiful.

But medicine … well, you see, medicine isn’t perfect. It’s full of politics and routine chores, and it is not all merit-based. There is too much tradition and too little innovation, and I don’t see it as a beacon of a meaningful career for me. Since college, I’ve seen exactly how the field doesn’t grok the power of technology, and healthcare/research is suffering because of that.

On the flip-side, the startup scene has shown me that the puzzle-solving side of computer science can be really meaningful. (And building something that millions of people depend on every day has got to be fun, too.) Avoiding med school will leave me with time and money to help out real people right now and solve problems that are relevant to my life.

So what, specifically, makes tech more attractive? What other factors made me reconsider the life of a doctor?

1. You can’t tackle medicine in your free time

If I find a new technology I want to try out, I can do that before bed or with friends at a coffee shop. All I need (usually) is a laptop and a clear head.

In medicine, if I find a new idea in social justice or a new vaccine candidate, what can I do? Read. And read. And read. There’s no getting my hands dirty outside of work, and that puts a damper on being creative. It means I can’t always be learning. (Or at least, that learning has to be straight from the books.) And for me, it’s always going to be harder to dive deep into a book when I won’t be able to apply it for a few years.

This doesn’t make medicine ‘bad’. But it means that I can engage technology on a much deeper level.

2. Medicine isn’t the place to see the best of technology

And for someone who loves technology, that’s excruciating. I applied to medical schools last year, and I can tell you that no med school has ever heard of agile development or table-free layouts or version control or data portability (scary, since they’re the ones deciding which medical records systems to buy).

In the lab, copying a Word document and adding a timestamp is considered version control, because there’s no established way of doing anything better. There is a 75% chance that reading that last sentence made you shudder.

3. Med school means delaying the future

Everyone talks about how meaningful the life of a doctor is. And I do think that practicing medicine can be a noble thing. But there are huge drawbacks, too.

You’re not a full contributing member of society until you’re almost 30, for example. I am, by nature, a doer and can’t stand the thought of waiting five more years to get my hands dirty in the real world. I want to get things done now.

Even as a doctor, your life and work are only as meaningful as you make them. Without effort, it can be routine, full of false alarms and cynical. On the other hand, I see a lot of engineers who get a thrill from problem-solving by day (for their own satisfaction) and helping people outside their very manageable working hours. So nobility isn’t cut and dried.

I know plenty of extraordinary doctors. I don’t want to diminish what they do, not at all. But I do want to kill the idea that becoming a doctor makes you a good person.

4. Learning about tech is open to everyone

A lot of techies who build the Web (conveniently) enjoy writing about the Web. That means that a lot of the quality content on the Web is, itself, about technology. And because content on the Web is mostly open, that means that quality content is available for anyone who wants it. It’s easy to start in this industry as an amateur and find yourself surrounded by interesting, accessible information. (So much so that you could, say, make a career out of it.)

Now, I know engineers are thinking to themselves, “What’s the big deal?” But this just doesn’t happen in medicine. Instead, probably because the field is so old, knowledge is often locked away behind paywalls or in text books. (That is gradually starting to change. The Public Library of Science is definitely a good thing.) It’s interesting to me that this is the case, and I’d love to see more medical blogs flourish like they have for tech.

Wrapping up, starting fresh

After a lot of thought and too many mind maps, I’m sure about the direction I’m taking. I’m excited to focus squarely on technology, to justify the time I’ve spent on it, and to advance the state of the Web. I love a good puzzle, and tech is a fulfilling, social, egalitarian way of getting that in my professional life. And who knows? Maybe some day–I keep telling myself–I’ll find a way to blend the two into something even more meaningful.

The ‘ugliness’ of Python

2011-01-23T00:00:00+00:00

Not what you think

I want to take on a big question, one that nobody’s really answered: which language is more beautiful, Ruby or Python?

You read that sentence and have already decided this is a flame war waiting to happen. But hear me out. I’m actually not going to call Python “ugly”, much as I started out thinking just that. Instead I’m going to look at differences to build a case for how each language wants us to code. A couple of notes before I start:

This is a big topic. Hence, a substantial article.
I’ve coded Ruby for 6 years and Python for 2 months. (Read: I do not know everything about Python.)
I learned Python via The Quick Python Book (Manning).
I’ve picked some syntactic and semantic points that separate Ruby and Python. I’m only going to focus on those points.
This is a functional perspective, at least, as much as possible in a multi-paradigm language.

With that said, let’s dig in.

Hooks & humanity

When I first came across Python’s hooks (__len__(), __str__(), __init__(), etc.), I was pretty horrified. I sat down to pen an epic diatribe against Python and its aesthetic flaws. About how I could never actually like coding in a language with syntax as heavy as that.

If you’d have seen the post I had planned, it would’ve read something like a rant:

I hate the underscore methods. Presumably they are underscored– four times , mind you–to avoid naming conflicts with methods you might want to create. Okay I get it, so if you want to define your very own init() method that isn’t a constructor, you can. That is, if you ever actually need to define an init() method that doesn’t … er … init. You can probably already tell: I come from a Ruby world. I like conventions that don’t surprise me. Ruby’s got hooks, too, after all, but they’re simple and (at least to me) make sense. If I ever do want to make an initialize method that doesn’t initialize, well, I’ll get over it and come up with a new name.

Then I talked it out with a therapist, who said I should be more worried about things like global warming and AIDS in Africa.

Now, in my mind, these method names are definitely ugly. They aren’t short, clean or straightforward. I don’t think anyone is actually happy typing the extra underscores.

But maybe vagueness is the point. Maybe Python wants to steer us away from these methods, at least until we really need them. If __len__() isn’t something you have to define every day, why not indicate that with underscores?

I think I can get behind that.

“You’re reading too much into this”. I’m sure you’re thinking it, and maybe you’re right. But I know one thing. When I define __init__(), it seems hackish, wrong. Rather than feeling like I’m using a normal feature of object-oriented program design, I feel like I’m going rogue; off the beaten path. I mean, what other reason would there be to surround a name with so much negative space?

(I think of it as a moat.)

And so I don’t define objects in Python. It doesn’t feel natural. I use maps and lists and functions as much as possible and leave objects to libraries. Turns out that’s a great way to actually get things done.

Functions, meet objects: Python’s take

Python, even though it’s a multi-paradigmn language, makes functional programming fun. We can create pure functions without much work and pass them around as a variable.

But you can’t always use functions with Python’s extensive libraries. (And libraries are why I learned Python in the first place.) Why not? Because lots of Python’s libraries are built out of objects, and it’s not straightforward to combine our favorite functions (for example, map) with object methods.

Now, we can, of course, deal with objects in Python without resorting to loops and if statements, but the most comfortable way is to use list comprehensions rather than map and reduce. Let’s take a quick example. Where Ruby encourages me to capitalize every element of a list by mapping the capitalize method onto each letter …

letters = ['a', 'b', 'c']
letters.map &:capitalize
# => ['A', 'B', 'C']

… Python encourages me to use a list comprehension:

letters = ['a', 'b', 'c']
[a.capitalize() for a in letters]
# => ['A', 'B', 'C']

It is possible to map object methods onto data structures in Python:

map(str.capitalize, letters)
# => ['A', 'B', 'C']

However, you don’t see that much in the wild, and I’d guess that’s because Python doesn’t make it feel really natural to work this way. (In my opinion, it’s conceptually more advanced than mapping a simple function. How is self determined here, for example.)

I’m not especially happy about using list comprehensions for basic data structure transformations. Should we really be pushed to abandon our favorite functions just because we want to call a method instead of a function? What’s more, I think that, for some things, list comprehensions distract us from the classes of data transformations that we’re actually doing.

Now, I’m not discounting list comprehensions. They’re elegant, powerful and have lots of applications. A great example is identifying palindromes:

s = 'string-with-palindromes-like-abbalabba'
l = len(s)
[s[x:y] for x in range(l) for y in range(x,l+1) if p(s[x:y])]

Assuming that p() returns true for all palindromes, that one line gives us all palindromes in s.

In Ruby, things are different.

Hurdles for Ruby

Ruby is beautiful because it balances consistency with convenience. Consistency makes functional programming accessible in many cases, but it also stops us from using some of the more advanced functional programming constructs. I see at least three hurdles to functional programming in Ruby:

Ruby needs extra syntax for function passing
There are no pure functions in Ruby
Ruby will not let us map over top-level def methods (TLDMs)

1. Syntax

This is not much of an objection, but I do want to mention it. Pythonistas will say, “Python lets me pass and call functions without any effort, but Ruby makes it hard!” That’s true, function passing is painless in Python and a little harder in Ruby. The issue is that in Ruby, naming a function automatically calls it. (This is great for writing DSLs without parentheses, but the cost is pretty high.) So you have to precede function or method names with an ampersand if you want to pass them around. And while coll.map &f isn’t a deal breaker, it is a hurdle to function passing, and I find that most new-to-intermediate Ruby programmers tend to stick to non-reusable blocks rather than function passing for most application code.

2. Pure functions

I’m going to go ahead and say it: Ruby doesn’t have functions. And how can you program functionally without functions? (Ouch.)

Let’s step back and think for a second.

Convention says Array#map is not a pure function. Why? Because it relies on data that you’re not passing to it as a parameter–the array itself. That is, numbers.map &:to_s could give different values in the same code–even though its argument doesn’t change–because numbers carries its own logic and state. Put another way, the map method draws on more information than you pass in to return a value.

So that’s that. Object-orientation, by definition, means no pure functions. Right?

Well, let’s think about this for a second. It would be pretty trivial to make map look like a pure function:

module Kernel
  def map(coll, &f)
    coll.map f
  end
end

Ruby likes to group functionality into classes, so you don’t see this in the standard library. But this is a possible patch. And it would almost look like Python.

I think this patch has a valuable point, even if we don’t implement it.

Maybe, just maybe, we should think of coll.map &f as a real, pure function–one that takes coll and f as arguments with a special syntax. One that translates without contradiction into map(coll, &f). Is this is the path to enlightenment?

Of course, there’s really no way of forcing a method to be a pure function in Ruby, especially because a method always has implicit access to self. However, it’s at least possible to code this way for teams that want to avoid the problems that come with mutable state.

3. Top-level methods

In Python, to create an algorithm f(x) and apply it to an array of arrays, we can use the standard def syntax:

def f(x):
    # My algorithm goes here

map(f, data)

In Ruby, though, we can’t pass around these top-level def methods (TLDMs) because they belong to an object. And that makes functional programming difficult. If I want to create an algorithm and then apply it to a list of numbers, I have four options:

# Option 1: Bulky lambda
f = lambda do |x|
  # algorithm
end

data.map &f

# Option 2: Monkey patch
class Array
  def f
    # algorithm
  end
end

data.map &:f

# Option 3: Explicit block
def f(x)
  # algorithm
end

data.map {|x| f(x) }

# Option 4: Method method
# (Thanks to several of you for this suggestion.)
def f(x)
  # algorithm
end

data.map &method(:f)

Each of those solutions has problems or is clunky. There’s really no getting around this, mainly because of the “Ruby immediately calls any method that you name” issue we talked about earlier.

So Ruby’s problem is the opposite of Python’s: In Python, functional passing is easy, and object-oriented programming is okay. But the two don’t mix in a really seamless way. In Ruby, though, functions and objects mix pretty well (if you consider Array#map a function), but pure functional programming is a little awkward and could be easier.

The more I’ve compared Ruby and Python, the more I’ve come to appreciate Ruby’s block construct. It solves several problems at once and is really pretty elegant.

General mayhem: blocks & transformations

Blocks have a big impact on Ruby code, especially object-oriented functional code. The best way to illustrate the effects I’m talking about is to dive into an example.

Example: Stock market. Suppose I have a set of news articles in Markdown format. Each article mentions a company zero or more times and links to it. Now, say I decide to count how many times a company shows up in one day’s articles. Because Big Data.

Each company name is listed by its URL each time, so if I see https://apple.com three times in an article, I know Apple is mentioned three times. (That’s “thrice” if you’re joining us from 17th century England.) After I extract all the domain names and capitalize them, I’ll compare them against a master list of companies I care about and decide which company is most popular for that day’s articles.

Now in Java, this is a chore. I can hear committees forming already. But in Ruby and Python, it’s an quick hack. In fact, I’d probably code it in a REPL.

Ruby’s blocks let us chain methods over several lines. More importantly, they let us mix internal object transformations with broader list transformations. This is powerful stuff:

# Companies are listed in 'companies', one per line
companies = File.read('companies').split("\n")
articles = ['article1.md', 'article2.md', 'article3.md']

# Simplified URL matcher
http_regex = /http:\/\/(?:\w+\.)*(\w+)\.(?:com|org|net)/

# NOT the simplest implementation, but it shows off blocks
# I'm using the regex to return a Match object, whose
# only element is our domain name
def parse_article(a)
  File.open(a) {|f| f.read }.
    scan(http_regex).
    flatten.
    map {|x| x.capitalize }.
    select {|x| companies.include? x }
end

articles.map {|x| parse_article x }.
  reduce(Array.new) {|x,y| x.concat y }

In fifteen lines of code, we use blocks six times for different reasons. If we wanted, we could also use blocks to build anonymous functions, continuations or loop over collections with side effects (via each). For the record, though, I don’t really like each.

Before I move on to the Python implementation, I have to show you how I would actually code this. You see, blocks are awesome, but too many braces get overwhelming. It’s nice to whittle down and organize your code a little when you use them all the time. I tend to give my blocks names (essentially, they’re functions) and keep them separate from my method chains and transformations. Here is equivalent code:

important = lambda {|x| companies.include? x }
articles.map do |a|
  File.read(a).scan(http_regex).flatten.
    map(&:capitalize).select(&important)
end.flatten

Flows a little better, no? And yes, people, that transformation took just five lines of code.

Aside. The mini-point here? Blocks are so important that we have loads of ways to write them. Without tons of braces.

Python doesn’t give us blocks or the end keyword, and whitespace matters—so chaining isn’t always awesome. If we want to do a direct translation of the code above, we can either turn to Lispy nesting or to throwaway variables. ( Edit. I’ve updated the Python regular expression to actually work. Thanks to many of you for pointing me to the findall method.)

import re

with open('companies') as f:
    companies = f.read().split("\n")

articles = ['article1.md', 'article2.md', 'article3.md']

http_regex = "http:\/\/(?:\w+\.)?(\w+)*\.(?:com|org|net)"
url        = lambda x: findall(http_regex, x)
name       = lambda x: x.capitalize()
important  = lambda x: x in companies

def parse_article(a):
    with open(a) as f:
        return \
            filter(important,
                   map(name,
                       map(url, f.read().split())))


[y for x in map(parse_article, articles) for y in x]

# ... or, for the verbose, an alternative implementation
# of parse_article ...

def parse_article(a):
    with open(a) as f:
        lines = f.read().split()
        urls = map(url, lines)
        names = map(name, urls)
        companies = filter(important, names)

    return companies

My point here? Python does fine without blocks. It can do all of the nifty stuff that blocks do in Ruby. But lambda, with, for, and list comprehensions all come with their own special syntax. Maybe that’s not necessary.

Chains & side effects

I want to give you one more Ruby block example. Because it’s just cool.

tap is a method that exists purely for side effects, just like each. It was designed to give us those side effects inside of a chain. tap yields its receiver to a block and then returns its receiver from that same block. You can print or examine anything about the object within that block, and it doesn’t affect the chain in the least. Say we want to debug our transformation above. Given an info method that examines our current array and prints its size, we can say:

def parse_article(a)
  File.read(a).split.   tap {|x| info x }.
    scan(http_regex).   tap {|x| info x }.
    flatten.            tap {|x| info x }.
    map(&:capitalize).  tap {|x| info x }.
    select(&important). tap {|x| info x }
end

parse_article 'article1.md'

# Hypothetical output:
# Array (1000)
# Array (1)
# Array (30)
# Array (30)
# Array (10)

Try debugging like that in Python.

The verdict

After mulling this over for a while, I’ve reached a tentative conclusion: Python is a pretty good language for some types of functional programming and a decent one for object-oriented programming, but not many people use the two together. That means that if you want to adopt a functional style but are using libraries built out of objects, you’ll run into conceptual road blocks. Changing coding style in the middle of a project is a drag, and I think Python really needs is a culture around using functional techniques with objects.

Another great feature would be a chain operator that lets us chain over multiple lines. (Clojure, for example, gives us ->.)

So functional programming is fun (and sometimes beautiful) in Python. Ruby’s beauty, on the other hand, comes from unity and adaptability.

Ruby is fantastically simple when you need it to be. (I know people who get by never having to pass a function.) But it scales really well with complexity. If your project is so big that you want to organize it with objects, you can. And you don’t have to sacrifice your functional style of programming to do so. If, instead, you want to organize your project around data structures and functions, you can do that too. The language is really malleable (sometimes too much), and its unifying principles really make it fun to use.

Ruby is not as elegant as Scheme, to be sure, nor is it quite as pliable. And it could use Python’s list comprehensions, because they are phenomenal. But I’m very happy using Ruby when I can, because it adapts to me, not the other way around. What syntax Ruby does have, it uses well. For example coll.map &f is equivalent to map(coll, &f), but the first is more intuitive to lots of people.

The problem I have with Ruby is that it discourages functions as first-class citizens. Every function I try to define is tied to an object by default. Most of the time, that object is mutable and stateful, and that makes concurrent programming hard. In my opinion, what Ruby needs is to take a hard look at questions like “should we really invoke method names without parentheses by default?”, especially from the point of view that it makes higher-order functions harder to write.

Regardless of how much I like Ruby, work demands that I use Python. But Python isn’t nearly as ugly as I anticipated. Over time, my initial disgust has turned into curiosity. My bafflement has turned into respect. And I’ve decided: I really do like this language. There is nothing that I can’t do in Python that I could do in Ruby, and there was almost no learning curve adding it to my tool belt.

Instead of thinking Ruby or Python–or any language–is more beautiful than the other, this article has pushed me to think about how a language makes me code and how syntax can really matter sometimes. In the case of Python, I’ve decided it really wants for loops to take center stage when it comes to iterating over objects. And because I prefer to think using transformations like map and reduce, I tend to stay away from building complex class definitions. Ruby, on the other hand, blends functions and objects in an intuitive way (with blocks as a great default way of mixing the two). I’m okay using classes when they make sense because I don’t have to give up functional techniques when I do. And that works, too.

And all of this has an interesting effect: When I move between these similar languages, my style changes a lot, and that’s not a bad thing. It keeps the languages separate in my mind and makes me value them both for the insight they bring to the table.

It would be interesting to take a look at how these subtle syntax changes affect Python and Ruby at the community level.

Ruby & Python: a quick comparison

2011-01-22T00:00:00+00:00

The hook

Learning Python, the first thing I winced at was the hook.

Ruby and Python both let us code hooks, methods that the interpreter calls at important parts of an object’s life cycle.

When I design an object, for example, I want to describe its initial state, but I don’t want to manage memory. To separate those two things, Ruby and Python create objects by allocating resources, then calling my initializing hook, and finally by returning my object. The only part of that I need to specify is the part that’s unique to my code. From Ruby’s object.c :

VALUE rb_class_new_instance(int argc, VALUE *argv, VALUE klass) {
  VALUE obj = rb_obj_alloc(klass);
  
  /* Call my initialize hook */
  rb_obj_call_init(obj, argc, argv);

  return obj;
}

The initialize hook is nice when I don’t need to explicitly manage memory. All I have to do is define the correct hook, and it’ll be called when the time’s right. And I even get return obj; for free, just because of how the hook is designed.

Ruby and Python both have hooks, though, so why is this a pivot point? The difference I’m getting at is how each language names hooks.

In Ruby, we find hooks that are clean and simple: initialize, to_s, <=>. Python names the same hooks, well, more cryptically: __init__(), __str__(), __cmp__().

These seem ugly, at first, and I definitely don’t like typing them. What if a language can have ugly features without itself being ugly? What if ugliness acts as a guide toward solid code and consistent style?

(I wrote more on that in ‘The Ugliness of Python’.)

Function passing

Ruby and Python let us pass functions as first-class objects. In Python, I define functions and pass them around, uncalled, without much effort. It’s beautiful. I feed one into another as a normal variable. No special syntax required. And then when I’m ready to call a function, all I need to do is add parentheses, just like in math. It’s really a breath of fresh air coming from Ruby:

f = lambda x, y: x + y ** 3 - y
f
# =>  at 0x100481050>

f(2, 3)
# => 26

reduce(f, [2, 3, 4], 0)
# => 90`

In Ruby, things are more complicated. I need an ampersand to pass a function and brackets to call it:

f = lambda {|x, y| x + y ** 3 - y }
f
# => #

f.call(2, 3)
# => 26

# Brackets are syntactic sugar for #call:
f[2, 3]
# => 26

[2, 3, 4].reduce(0, &f)
# => 90

More importantly (and annoyingly), if I define a top-level method with def–let’s call those top-level def methods (TLDMs)–Ruby won’t let me pass it as a block to any other method. (TLDMs actually belong to an object, so strictly speaking, this makes sense. It’s still annoying.) In Python, we can pass lambdas and TLDMs like they’re identical.

So Ruby makes function-passing doable. Python makes it absolutely painless.

Functions, methods, objects

Out of the box, Python gives us great tools for functional programming goodness:

len(coll)
map(f, coll)
reduce(f, coll, i)
filter(f, coll)
str(obj)

Not all the Lisp functions are functions in Python, though–and this was a shocker for me. Some are instance methods. (Examples: capitalize(), reverse().) Actually, it’s hard to guess when Python will go one way or the other. It seems to follow convenience and tradition more than any kind of rationale, and that makes coding a little confusing. Sometimes it’s even hard to guess what the right receiver is. To join a list, for example, you have to pass the list to a string! ( ''.join(['my', 'list', 'here']).) This trips me up every time.

In Ruby, every function has a receiver and is really a method. (Lambdas come close to being functions.) And for Ruby libraries, there isn’t usually a question about whether to manipulate a data structure with a method or a function. The Ruby equivalents of the above, for example, are all methods:

coll.length
coll.map &f
coll.reduce i, &f
coll.select &f
obj.to_s

Ruby lets us use these methods on other methods and on lambdas, so I can do this without a problem:

str = lambda {|x| x.to_s }

numbers = [1, 2, 3]
strings = ['a', 'b', 'c']

numbers.map &str
# => ['1', '2', '3']

strings.map &:capitalize
# => ['A', 'B', 'C']

(Note: methods need a colon after the ampersand because you’re naming a method to call, not passing an actual method object.)

Blocks

In Ruby, blocks unify anonymous functions, variable capture (closure), and iteration. On top of that, they make chaining really simple. Python gives us the tools to do all of the same things as Ruby, but they’re not quite as unified. (I’ll have a detailed example for you in the next article.)

Explicitness

Python holds explicitness as a virtue, and Ruby doesn’t. I find this topic fascinating and want to dive deeper in an upcoming article.

Miscellaneous

The differences I just listed are most important to me . Other people might include a couple of other biggies:

Ruby lacks list comprehensions
Python doesn’t give us literals for regular expressions
Python distinguishes attributes and methods; Ruby doesn’t

Ruby is beautiful (but I’m learning Python)

2010-11-23T00:00:00+00:00

Java, Ruby & expressiveness

Six years ago, I added Ruby to my technical arsenal. I learned C++ and Java in high school, and I planned to use them for statistics in college—mainly in the lab. But when I discovered Ruby, I knew something was different. Ruby let me be productive and get things done fast. It was ridiculously useful, for everything from renaming files to plotting finances to doing math homework to preparing lab reports. I didn’t need C’s speed or Java’s safety, so Ruby—a little slower, but dynamic—was perfect for me.

What struck me about Ruby? What features made me move from a static, fast language like Java to such a different paradigm?

First, the language is strikingly expressive.

A standard “Hello, World” program looks like this in Java:

class ThisIsAClassIDontReallyWantToNameButJavaMakesMe {
  public static void main() {
    System.out.println("Hello World");
  }
}

Of course, this is an extreme example: Most systems in Java won’t see this high ratio of boilerplate to significant code. But the Java world does generally accept boilerplate as okay (saying it should be handled by an IDE), and that has pretty big effects on the ecosystem.

Yukihiro Matsumoto, on the other hand, doesn’t believe in much hierarchy and wants to avoid surprise. So he lets us do the same thing in Ruby much more easily:

puts 'Hello World'

Another obvious example of Ruby’s expressiveness is its API for IO.

In Java, IO can be a chore. Sometimes I think its designers didn’t think we’d ever do things like read files. But of course it’s not that. It’s just that Java is honed for building robust, large-scale systems with fine-grained control over bits and bytes. So the standard library isn’t optimized for scripting or naive implementations. In Java, users have to specify things like buffer interactions even if they don’t need them. Reading a file goes something like this ( Note : I haven’t coded in Java for six years, so there may now be better ways):

import java.io.File;
import java.io.FileInputStream;

class FileReader {
  public static void main(String args[]) {
    try {
      File file = new File("./text.md");
      FileInputStream fis = new FileInputStream(file);
      byte[] contents = new byte[(int) file.length()];
      fis.read(contents);
      // do something with contents
    } catch (Exception e) {
      e.printStackTrace();
    }
  }
}

The same task in Ruby looks something like this:

contents = File.read 'text.md'

Java certainly gives us more control here, and that’s really appropriate sometimes. But for engineers who don’t need to optimize and do read files a lot, especially while scripting, having an easy-to-remember method to read files is a game changer.

Blocks make function passing easy

So Ruby removes boilerplate and gives us lots of power in just a few keystrokes. That’s excellent, but if that were all Ruby offered, Java’s advanced IDEs could be just as good of a solution. But Ruby also lets us decouple our code into re-usable chunks much more easily. It does this by define uncalled functions (as blocks) and manipulate them in our code.

When we pass a blocks to a method, we get the chance to specify how our outer code (the block definition and all of our local state) interacts with the inner code (the method definition). We pass our blocks around, uncalled by default, until exactly when we want to evaluate them. This means we can easily combine logic and state from multiple places to effectively decouple our code and achieve maximal code reuse. Taking a page out of the functional programming book, blocks effectively introduce higher-order programming to Ruby in a way that’s both useful and easy to understand.

Why, specifically, would I want to pass around functions in a language?

Certainly when I was coding in Java, I never knew I needed to. I happily chipped away at similar problems repeatedly using for loops, iterators and tightly coupled functions. Those were the language primitives I was given.

But Ruby enlightened me. (If I had been around during Lisp’s heydey, I would’ve learned that earlier.) Instead of implementing endless loops to work through data, Ruby showed me I could outsource that work to a standard set of methods for lots of tasks. For example, I learned that to transform every item in a list, all I need to do is call map on that list, and pass in a block that does the transformation. In turn, it’s the array, and not my application code, that has already defined how to run that function on every item it contains.

I could also slice and dice a list with a block, or reduce it down to a single value. The point is that instead of having to repeat implementation details and edge cases in multiple spots (think C-style for loops), I can code at a level of abstraction that’s appropriate for my application.

In fact, Ruby is so well designed that for loops aren’t very common and, in fact, most Rubyists scratch their heads if you do use one.

There have been a few attempts to harness higher-order functions in Java. However, so far they’ve seemed clunky, and they haven’t taken the Java world by storm. ( Note : There is a project underway that aims to add anonymous functions to Java. I look forward to seeing how that goes.)

A case study: list filtering

If you’re already on board with higher-order programming, you already know the power that comes from Ruby’s blocks, and you can skip this section. But if youdon’t believe me, let’s try to make this more concrete. When exactly can we benefit from function passing? What does this look like?

It’s often the case that we have a collection of items that we want to filter them before processing. Maybe we’re looking at a dataset and we want our dataset without empty values. Or maybe we want to get all of the words in a list that have more than 3 characters. Or all users who have logged into our site within the last hour.

Naturally when you want to filter something, you need an actual standard, a criterion that acts as a guard for your data, one that allows some numbers to pass into a new list and bounces others. (In fact, a more advanced type of filter is a winnower— a function that returns two sets of data. The first meets our criterion. The second does not.) That sounds nice in theory, but what does it mean to be a criterion? How do we code that?

A common way in languages without higher-order functions is to iterate over a collection and manually collect items that meet the hardcoded expression in an if statement:

import java.util.Arrays;

class ArrayFilterer {
  public static int[] filterEven(int[] array) {
    int[] filteredArray = new int[array.length];
    int filteredArraySize = 0;

    for (int i : array) {
      // The next line is the only unique part of
      // this function!
      if (i % 2 == 0) {
        // Someone please tell me there's an easier way
        // to append to an array than this
        filteredArray[i] = i;
        filteredArraySize += 1;
      }
    }

    return Arrays.copyOfRange(filteredArray, 0, filteredArraySize);
  }

  public static void main(String[] args) {
    int[] array = { 1, 2, 3, 4, 5, 6 };
    filterEven(array); // => { 2, 4, 6, 0, 0, 0 }
  }
}

This is a fine way to filter if we need to do it once ever. But what if we want to filter the same list using another criterion? In Java, we build another loop and filter again!

It may be okay to program at such a repetitive, low level for performance-sensitive code (like you might in C). But what if we could decouple the criterion and the grunt work for this filter? If we could do that, not only could we have more reusable chunks of code, but some of those key chunks might even be provided by the standard library!

To get a handle on this goal, notice that the only part of our Java function that has to do with “evenness” is the expression inside the if statement.

if (i % 2 == 0)

That’s one line out of eight! And in fact, only half of that line is unique. Now, we can’t extract that body in Java because that line needs access to each element of the array in turn ( i). We also can’t just pass the % 2 == 0 bit because Java doesn’t allow currying (leaving the operator unevaluated until it has enough arguments, in this case).

If only we could pass around an uncalled set of code, ready to be executed on parameters any time we chose… Oh wait, that’s what a function is!

While Java may not (yet) support passing functions around as objects, Ruby does, so let’s do this thing:

def filter(arr, &criteria)
  filtered_arr = []

  arr.each do |elem|
    if criteria.call(elem)
      filtered_arr << elem
    end
  end

  filtered_arr
end

array = [1, 2, 3, 4, 5, 6]
criterion = lambda {|n| n % 2 == 0 }

filter(array, &criterion)
# => [2, 4, 6]

See that? I defined functions for general filtering and for my criteria completely independently and simply passed criterion in to filter at runtime. This decoupling gave us the benefits we wanted with almost no syntax overhead.

And in fact, because code can be separated along these lines, the collections in our standard library have already defined the most common functions for us, so lots of the time, the only thing we need to provide is a block (and not the reusable implementation).

We can rewrite the above using only methods provided by the standard library:

array.select(&:even?)
# => [2, 4, 6]

This time I didn’t have to define a single function! I re-used two completely decoupled methods, Array#select and Integer#even? —methods that didn’t even know about each other—to accomplish this task.

The takeaway: if you let your language deal with repetitive details like looping and even-ness, you get to focus on the unique bits of code that differentiate your application from everyone else’s. And having so many reusable methods available to us makes it not only useful, but also easy to do the right thing by default.

From Java to Ruby

Higher-order functions have transformed my code so much that I barely recognize what I was writing without them. In fact, in 2005, I stopped coding in Java and learned Ruby. Because I wasn’t working in an environment where stability or fine-grained data structure manipulation was necessary, I shifted to favor of ease-of-use instead of perfect control. Yes, Java ran faster than Ruby. Yes, more people were ‘doing’ Java. No, Ruby would not get you a job—this was before Rails 2. But I didn’t care about any of that. I wanted an easy, elegant way to make my computer work for me, to write scripts, create libraries, analyze data, and build Web applications. I wanted to move quickly, get things done. I was tired of spending too much time on boilerplate code and not enough time solving real problems.

(Update: As of 2016, my needs have shifted in the other direction, and I’m much more in favor of statically typed systems like Haskell. However, I think it was important for the pendulum to swing in the Ruby direction during the early stages of my career. I also think that Haskell’s type system is far better than Java’s, which is pretty strongly tied to mutable objects.)

I think the industry still hasn’t grasped the elegance of the style of code I’m looking at here. I get an expressive, flexible syntax, almost like Lisp. But in a friendly language that tries, above all else, not to surprise me. I get the ability to interoperate with Unix with just a backtick, but the language also runs on Windows and other platforms. And over the last six years, this power and expressiveness has been invaluable to me. I’ve learned to build websites using Rack and Sinatra and Rails—in fact, the site you’re looking at is powered by Ruby that I wrote—and I feel like an expert at the language.

I’m in the process of building out a couple of open source libraries. And generally, I’m satisfied with the way things are in the Ruby community. Exciting things are happening.

What I’m missing (and it’s not the semi-colon)

Okay, I’ve spent lots of time praising Ruby for being beautiful, expressive and pragmatic. I do love the language and think it makes programming painless in lots of ways. But the Ruby community is not giving me one very important thing, something that’s vital for me at work: solid tools for science and statistics. I’ve already leveraged Ruby at work to dramatically speed up some of our high-throughput experimoents—for processing and summarizing data—but I’ve always had to pipe my numbers into R for statistics and graphing. Because unfortunately, despite all the hubbub around Ruby, no one seems to be crunching numbers with it. I’m not completely comfortable with R, though, and I want a one-stop solution for my numbers needs. So I’m going to do that thing that no Ruby programmer wants to do: I’m going to learn Python.

Now, learning Python is something that is usually frowned upon in the Ruby world, probably just because it’s the ‘rival language’ and is pretty similar. The traditional argument is: “Don’t learn Python! It’s a dynamically typed, multi-paradigm, interpreted scripting language, just like Ruby! And it’s ugly.” And all of that is true. But I’ve found (over four years, and especially this year) that really Ruby is focused on one thing, and that’s web development.

I love Web dev, and I’ve done my fair share of it. But I also have a day job that depends a lot on statistics.

In theory, I could port functions from SciPy and NumPy to Ruby. It’s been tried before without success (see the failed SciRuby project) and I’m pretty confident I don’t want to go that route. It takes a community, and not just one person, to foster something like that. And I have other things to focus on now.

Instead, I’m going to leverage the huge data ecosystem that’s grown around Python and add the language to my résumé. SciPy is unrivaled in the modern programming world, and I plan to embrace it for projects at work. What’s more, Python supports higher-order functions like Ruby, so I’m not missing out on all the functional goodness I described earlier.

However, this big change is not without its problems.

Python package management

This past week, I started my foray into the Python world. This involved installing Python 2.7 and 3.1, bookmarking Dive into Python, and figuring out package management.

Er, I guess, by “figuring out”, I mean “being completely baffled by”.

At the moment, Python package management seems to be fragmented and complicated. I am used to typing something like gem install symbolic when I want to install something on my machine. It’s standard, simple, and rarely causes problems.

In Python, though, there seem to be competing managers (easy_install and pip) and separate ways to package libraries for uploading. I’m also hearing names like setuptools, distutils, distribute and virtualenv, and I have to say that the whole ecosystem isn’t too clear to me yet. And the documentation tends to assume I know what all the above mean already!

After asking around, I take it I should use pip for package management—apparently Pip is the future. In fact, it looks like it’s meant to mimic RubyGems. So installing SciPy should be a simple pip install numpy scipy.

Awesome. Easy as Ruby.

But wait. What’s this? I see a lot of text moving down my screen. God. My computer is starting to heat up. Now I’m seeing errors all over the place. “You need a FORTRAN compiler. Found gfortran. Installation failed.”

Wha?! I mean, I expected C dependencies, sure. I would hate to do math without them. But FORTRAN? Is FORTRAN something we’re still installing in 2010?

I’m hoping to get this all sorted out soon and start doing some heavy stats with the new language. I’m excited to be joining a group of people who are focused on data and experiments, and not only HTTP requests, MVC, APIs, jQuery and event processing. That’s not a jab at the Rubyists I know. I’m just excited to put a full language (and not just a web framework) to good use.

What to like about Ruby

Now, I want to be clear: I like Ruby better than Python. Its “developer UX” make more sense to me than Python’s, and the language itself seems more expressive. I love that objects know how to map or filter themselves, which leads to elegant chaining. I love that blocks unify Ruby’s closures, anonymous functions, and iteration. That Ruby has such versatile syntax that it can masquerade as C or Perl or Scheme (more on that some other time). That when I code in Ruby, it does everything the way I would expect to–without reading documentation.

But Python and Ruby aren’t so different, and I want a strong data community to work with. I don’t want to duplicate effort to bring solid libraries to Ruby, and Python seems like the way to go. I might even use this new language to dabble in machine learning using PyBrain and the Natural Language Toolkit, which both, well, rock my socks off. The potential for number crunching in Python seems endless.

Anyway, statistics knowledge is in demand and probably will be in the future, so I’m happy to become competent with these tools. Maybe someday, that will be viable on the Ruby platform–and I look forward to that day. For now, Python is going into my toolbelt. And I welcome the challenge that implies.