Future of Statistical Programming


The basic idea is that there's a gap between the tools we use for teaching/learning statistics, and the tools we use for doing statistics. Worse than that, there's no trajectory to make the connection between the tools for learning statistics and the tools for doing statistics. I think that learners of statistics should also be doers of statistics. So, a tool for statistical programming should be able to step learners from learning statistics and statistical programming to truly doing data analysis.

When I refer to tools for learning statistics, I mean things like applets, TinkerPlots, and Fathom. I have nothing against these tools-- I think that they do a great job of teaching statistical concepts. But, as with any new software tool, they take some cognitive effort to learn, and I'm not sure that there's a great payout for that effort. You can't put TinkerPlots on your resume, and if you actually want to apply the skills you've learned to real data, you need to learn another tool.

And when I talk about tools for doing statistics, I mean SAS, STATA, SPSS, python, julia, or R. These tools require some traditional "programming," but they allow for much more flexibility in what you can produce. If you want to do something that doesn't currently exist in the package, you can create it. Personally, I do most of my statistical programming in R. It's the statistical programming language that has the biggest community of users, and the widest variety of user-contributed packages. Python and Julia are getting mentioned more and more, but they're still not the tool of the majority.

And in fact, when I teach statistics, I'm often teaching R. In the 100-level statistics classes at UCLA (those for undergraduate statistics majors) that's the tool of choice, and after years of trial-and-error, we're also using it in Mobilize. This can be a huge challenge, because R wasn't exactly designed for ease of learning. In order to make it easier on teachers and students, we've created an R package of wrapper functions, MobilizeSimple, which simplifies tasks like text analysis and map-making. But there are still many idiosyncrasies to R, like the $ versus model syntax. Since both syntaxes are technically correct, there's no consensus in R code, and learners have to deal with both.

Bridging the gap