A week or so ago, a list of perverse incentives in academia made rounds. It offers examples like “rewarding an increased number of citations” that – instead of encouraging work of high quality and impact – results in inflated citation lists, an academic tit-for-tat which has become standard practice. Likewise, rewarding a high number of publications doesn’t produce more good science, but merely finer slices of the same science.
It’s not like perverse incentives in academia is news. I wrote about this problem ten years ago
, referring to it as the confusion of primary goals (good science) with secondary criteria (like, for example, the number of publications). I later learned
that Steven Pinker made the same distinction for evolutionary goals, referring to it as ‘proximate’ vs ‘ultimate’ causes. The difference can be illustrated in a simple diagram (see below). A primary goal is a local optimum in some fitness landscape – it’s where you want to go. A secondary criterion is the first approximation for the direction towards the local optimum. But once you’re on the way, higher-order corrections must be taken into account, otherwise the secondary criterion will miss the goal – often badly.
The number of publications, to come back to this example, is a good first-order approximation. Publications demonstrate that a scientist is alive and working, is able to think up and finish research projects, and – provided the paper are published in peer reviewed journals – that their research meets the quality standard of the field. To second approximation, however, increasing the number of publications does not necessarily also lead to more good science. Two short papers don’t fit as much research as do two long ones. Thus, to second approximation we could take into account the length of papers. Then again, the length of a paper is only meaningful if it’s published in a journal that has a policy of cutting superfluous content. Hence, you have to further refine the measure. And so on. This type of refinement isn’t specific to science. You can see in many other areas of our lives that, as time passes, the means to reach desired goals must be more carefully defined to make sure they still lead where we want to go. Take sports as example. As new technologies arise, the Olympic committee has added many additional criteria on what shoes or clothes athletes are admitted to wear, which drugs make for an unfair advantage, and they’ve had to rethink what distinguishes a man from a woman. Or tax laws. The Bible left it at “When the crop comes in, give a fifth of it to Pharaoh.” Today we have books full of ifs and thens and whatnots so incomprehensible I suspect it’s no coincidence suicide rates peak during tax season. It’s debatable of course whether current tax laws indeed serve a desirable goal, but I don’t want to stray into politics. Relevant here is only the trend: Collective human behavior is difficult to organize, and it’s normal that secondary criteria to reach primary goals must be refined as time passes. The need to quantify academic success is a recent development. It’s a consequence of changes in our societies, of globalization, increased mobility and connectivity, and is driven by the increased total number of people in academic research. Academia has reached a size where accountability is both important and increasingly difficult. Unless you work in a tiny subfield, you almost certainly don’t know everyone in your community and can’t read every single publication. At the same time, people are more mobile than ever, and applying for positions has never been easier. This means academics need ways to judge colleagues and their work quickly and accurately. It’s not optional – it’s necessary. Our society changes, and academia has to change with it. It’s either adapt or die. But what has been academics’ reaction to this challenge? The most prevalent reaction I witness is nostalgia: The wish to return to the good old times. Back then, you know, when everyone on the committee had the time to actually read all the application documents and was familiar with all the applicants’ work anyway. Back then when nobody asked us to explain the impact of our work and when we didn’t have to come up with 5-year plans. Back then, when they recommended that pregnant women smoke. Well, there’s no going back in time, and I’m glad the past has passed. I therefore have little patience for such romantic talk: It’s not going to happen, period. Good measures for scientific success are necessary – there’s no way around it. Another common reaction is the claim that quality isn’t measurable – more romantic nonsense. Everything is measurable, at least in principle. In practice, many things are difficult to measure. That’s exactly why measures have to be improved constantly. Then, inevitably, someone will bring up Goodhart’s Law: “When a measure becomes a target, it ceases to be a good measure.” But that is clearly wrong. Sorry, Goodhard. If you want to indeed optimize the measure, you get exactly what you asked for. The problem is that often the measure wasn’t what you wanted to begin with. With use of the terminology introduced above, Goodhard’s Law can be reformulated as: “When people optimize a secondary criterion, they will eventually reach a point where further optimization diverts from the main goal.” But our reaction to this should be to improve the measure, not throw the towel and complain “It’s not possible.” This stubborn denial of reality, however, has an unfortunate consequence: Academia has gotten stuck with the simple-but-bad secondary criteria that are currently in use: number of publications, the infamous h-index, the journal impact factor, renown co-authors, positions held at prestigious places, and so on. We all know they’re bad measures. But we use them anyway because we simply don’t have anything better. If your director/dean/head/board is asked to demonstrate how great your place is, they’ll fall back on the familiar number of publications, and as a bonus point out who has recently published in Nature. I’ve seen it happen. I just had to fill in a form for the institute’s board in which I was asked for my h-index and my paper count. Last week, someone asked me if I’d changed my mind in the ten years since I wrote about this problem first. Needless to say, I still think bad measures are bad for science. But I think that I was very, very naïve to believe just drawing attention to the problem would make any difference. Did I really think that scientists would see the risk to their discipline and do something about it? Apparently that’s exactly what I did believe. Of course nothing like this happened. And it’s not just because I’m a nobody who nobody’s listening to. Similar concerns like mine have been raised with increasing frequency by more widely known people in more popular outlets, like Nature and Wired. But nothing’s changed. The biggest obstacle to progress is that academics don’t want to admit the problem is of their own making. Instead, they blame others: policy makers, university administrators, funding agencies. But these merely use measures that academics themselves are using. The result has been lots of talk and little action. But what we really need is a practical solution. And of course I have one on offer: An open-source software that allows every researcher to customize their own measure for what they think is “good science” based on the available data. That would include the number of publications and their citations. But there is much more information in the data which currently isn’t used. You might want to know whether someone’s research connects areas that are only loosely connected. Or how many single-authored papers they have. You might want to know how well their keyword-cloud overlaps with that of your institute. You might want to develop a measure for how “deep” and “broad” someone’s research is – two terms that are often used in recommendation letters but that are extremely vague. Such individualized measures wouldn’t only automatically update as people revise criteria, but they would also counteract the streamlining of global research and encourage local variety. Why isn’t this happening? Well, besides me there’s no one to do it. And I have given up trying to get funding for interdisciplinary research. The inevitable response I get is that I’m not qualified. Of course it’s correct – I’m not qualified to code and design a user-interface. But I’m totally qualified to hire some people and kick their asses. Trust me, I have experience kicking ass. Price tag to save academia: An estimated 2 million Euro for 5 years. What else has changed in the last ten years? I’ve found out that it’s possible to get paid for writing. My freelance work has been going well. The main obstacle I’ve faced is lack of time, not lack of opportunity. And so, when I look at academia now, I do it with one leg outside. What I see is that academia needs me more than I need academia. The current incentives are extremely inefficient and waste a lot of money. But nothing is going to change until we admit that solving the problem is our own responsibility.
Maybe, when I write about this again, ten years from now, I’ll not refer to academics as “us” but as “they.”