Professor Ross Ihaka is a star of the statistical world. He tells Julie Middleton how a simple programming language he helped develop to assist undergraduate students in the 1990s went global – and is still growing.
For most of us, R is the 18th letter of the alphabet or the rating of movies you won’t allow the kids to watch. But for many of the world’s major companies, R is a must-have software package that has radically changed the way they mine data. Google employs R to calculate the return on investment in advertising campaigns, and Ford to improve the design of its vehicles. In New Zealand, ANZ uses R for credit-risk analysis and Air New Zealand for crunching customer data, and it’s widely used in Government departments such as Statistics New Zealand, the Inland Revenue and the Department of Conservation.
R is taught in universities all over the world, including Stanford, Berkeley, Harvard, Oxford and Cambridge, and has spawned dozens of books. Companies have built entire analytic platforms around it, and R developers are among the highest-paid. Microsoft is exploring how to incorporate R into some of its products.
No-one is more “stunned” at R’s success than Associate Professor Ross Ihaka (Pākehā, Ngāti Kahungunu ki Wairarapa, Rangitāne) of the Department of Statistics. “When R was born in the early 1990s, I didn’t really expect it would be used outside of the University of Auckland,” says Ross, a quietly wry, self-confessed Westie. “There were certainly no thoughts of world domination.”
But dominate R has, making Ross and his collaborator, Robert Gentleman, the statistical world’s version of rock stars. Simply put, R allows people to wrangle lots of data at once. It success lies in its user-friendly simplicity, coupled with power and versatility: people without major programming training find it easy to use.
Critically, R is open-source, which means it’s free and anyone can develop add-ons, as long as they share the underpinning source code. And develop they do, enthusiastically. As of January this year, there were more than 7,800 plug-ins. For example, there’s one that analyses speech patterns, another that’s used in genome studies and, for fun, one that generates Sudoku puzzles. It is this extensive library that puts R ahead of its competitors; in the last year alone, there have been 150 million downloads of packages.
The story all began back in the early 1990s when the internet was in its infancy and computers at the University of Auckland were boxy Macintoshes with floppy disks. Ross had done his undergraduate degree at Auckland then his masters and PhD at Berkeley, returning to a lecturing post in the Department of Statistics. Undergraduate students were using what Ross calls “old and clunky programmes” for their data analysis, and he thought there had to be a better way.
So when Canadian colleague Robert Gentleman stopped Ross in the corridor one day and suggested they write some software together, he agreed. Neither were programming experts, but in 1991 they started tinkering – Ross calls it “playing games”.
They eventually began creating “a basic structure that people could start plugging things into”; what emerged was a user-friendly tool for students to do data analysis and produce graphical models of that information. They called it R, after the initials of their first names. R quickly became the mainstay of statistics classes, with student grumbles the opportunity to make improvements. “Originally, our big ambition was to use it for teaching first-year classes – small, local things.”
Ross and Robert never commercialised R “because we couldn’t deliver any final product, but we could provide the means for other people to develop a final product”. When overseas colleagues became interested, they made R available to all.
And that suits Ross, who says he’s an “anarchist from way back”: in fact, he has described R as proof of the success of the “rusting-hulk model of software development. If you went to a junkyard and hauled out an old junker and put it by the road and stood there looking helpless, people, being do-it-yourself types, would step in and help you out, and after a couple of hours you’d have a pretty good car.
“So we cobbled this thing together and hung it out by the side of the internet, and after a few years we had a pretty good piece of software. But it’s the contribution of lots of people.” The 1996 paper that the pair wrote introducing R to the world has been cited a whopping 8,300 times, according to Google – the cut-through that academics dream of. The acknowledgements at the back of the paper thank “all our colleagues and students that were and still are our guinea pigs”.
Hadley Wickham was one of those guinea pigs; he was doing his BSc in Statistics and Computer Science when he met R. “I still have the strongest memory of being baffled and surprised by the way that R worked,” says Hadley, now based in the US, “and that initial surprise led me to dig very deeply into R over the course of my career.” So deeply, in fact, that his highly successful company, which builds data science tools, is completely built on R. But Hadley, who is also an Adjunct Professor of Statistics at the University of Auckland, jokes that he also uses R for “challenges that others would not” – like building a public website to store his family’s recipe collection. Yep, you can build websites with R, too.
Ross and Robert, who is now back in North America, weren’t to know it then, but their timing was perfect. “With hindsight, you can see that here was a gap needing to be filled, and in a sense we came along at the perfect time,” reflects Ross. “The internet was getting started, and free software was in the air, and people were beginning to think about contributing to free projects. We had no plans for world domination – we shared it with students, and it grew organically from there.”
R is, then, the ultimate virus, replicating and changing itself with the devoted help of thousands of people. Ross says that that observing the global labour of love has tempered his cynicism. “R changed my opinion of humanity to some extent, to see how people are really willing to freely give of themselves and produce something larger than themselves without any thought of personal glory. There’s a lot of work with no recognition.”