The software developer job interview doesn’t work. Companies should stop relying on them. The savviest teams will outcompete their peers by devising alternative hiring schemes.
Years from now, we’ll look back at the 2015 developer interview as an anachronism, akin to hiring an orchestra cellist with a personality test and a quiz about music theory rather than a blind audition.
Being good at navigating hiring processes requires a basket of skills that isn’t correlated with job performance. The world is full of people who can speak expertly about programming, but can’t effectively code. The majority of people who can code can’t do it well in an interview. Our hiring process therefore systematically misprices candidates. It’s a moral problem and a market failure. Profit from its correction.
This post is long. Maybe you’d like to skip forward? Here’s a table of contents.
Software developers are hard to hire. Security people are hard to hire. Software security testers work in the intersection of those two sets, and are especially hard to hire.
Fewer than 1 in 100 software security testers can competently test cryptography; real-world cryptography testing requires anomalously good programming skill and conversance with a sprawling literature of industry and academic crypto research. You’d think that this literature comes with the turf for software security pros, but that’s not remotely true. Cryptography engineers are prohibitively hard to hire.
Now bear with me because this is going to get a little discursive.
There’s an important construction in cryptography called DSA, the “digital signature algorithm”. Digital signatures are a foundation of public key crypto. DSA, and its elliptic curve variant (ECDSA), is used to secure network traffic, game consoles, (yes, and Bitcoin.) and operating systems.
DSA/ECDSA is terribly easy to mess up. For example, you can make a trivial programming mistake and reuse a nonce. (a nonce is a random number that must be unique for a given key) When you do, a high school algebra student can recover your private key from a public signature. This attack broke, among other things, the Playstation 3.
I have a crypto consiglieri, Nate Lawson, to whom I owe basically everything I know about attacking crypto. Perhaps intending to put me in my place, Nate sent me a paper. It described a variant of that simple attack. The new attack targets a programming mistake that is even easier to make than repeating an nonce. (The new bug: using a biased nonce, where not every bit is perfectly random. Ask me over a beer how this happens.) The exploit for this flaw is most definitely not a high school algebra problem.
So, some color on how difficult this attack is. First: where the Playstation 3 flaw requires only a pair of signatures, this attack requires the collection and processing of many thousands. But that’s not the tricky part. The tricky part starts with a lattice reduction step, which is a complicated linear algebra process one of the world’s best cryptographers mocks other cryptographers for not understanding. But wait! There’s more! After the lattice reduction step, the attack involves a kind of binary search algorithm that relies on a Fourier transform.
I skimmed the paper, did indeed feel put in my place, nodded gravely, and passed it to my team, forgetting to accompany it with “this is an illustration of the kind of crypto attack it is not reasonable for us to implement”.
Two days later, I’m collecting team members to grab coffee with. Getting up to join us, Alex, who has been on the team for just a few months, offhandedly informs me that he’s written a working exploit for the attack.
That exploit code, which cannot possibly exist in my office, might be one of just a few implementations in the world.
Give me a month and good test vectors, and I’ll get a crypto attack that needs an IFFT working, through sheer brute force trial and error. But here, you don’t know if you got the Fourier step right unless you also get the lattice reduction step working. Fun extra detail! The BKZ lattice basis reduction algorithm? Each edit/compile/run trial takes 4-6 hours.
Nothing in Alex’s background offered a hint that this would happen. He had Walter White’s resume, but Heisenberg’s aptitude. None of us saw it coming. My name is Thomas Ptacek and I endorse this terrible pun. Alex was the one who nonced.
A few years ago, Matasano couldn’t have hired Alex, because we relied on interviews and resumes to hire. Then we made some changes, and became a machine that spotted and recruited people like Alex: line of business .NET developers at insurance companies who pulled Rails core CVEs out of their first hour looking at the code. Sysadmins who hardware-reversed assembly firmware for phone chipsets. Epiphany: the talent is out there, but you can’t find it on a resume.
Our field selects engineers using a process that is worse than reading chicken entrails. Like interviews, poultry intestine has little to tell you about whether to hire someone. But they’re a more pleasant eating experience than a lunch interview.
Consider some basic elements of how we structure interviews.
First, in most companies, we assign them to random engineers. 5 candidates for the same headcount slot might be interviewed by 5 different teams.
Next, interviewers make up their own interviews. This is crazy. One team member wants developers to reason their way through PATRICIA trie searches. Another one really wants to see if you can code quicksort from memory. Optimize for this cache line. Cut the latency from this HTTP request.
Some engineers interview candidates collaboratively. Others do it adversarially. Some want to see code on a whiteboard. Others are happy just talking. And some interviewers, face it, suck (nobody trains them!) and ask trivia questions.
It gets worse. We’re unclear about selection criteria. The gauntlet of tricky technical questions is just the beginning. Driven in part by an oral tradition of how the last 20 years of technical job interviews has resulted in terrible hires, interviewers try to assess for “smart and gets things done”. In other words: “subjective and even more subjective x-factor”.
It is all love with me and this observation and I make it as a card-carrying member of the tribe to which it is directed, but here goes: there may be no cohort of professionals less qualified to assess barely-tangible socio-psychological attributes like “passion” and “confidence” than the modern software nerd.
Random employees conducting random interviews based in part on subjective psychological assessments, each producing not data but a “hire/no-hire” recommendation, reassembled by a hiring manager into a decision that would be made only marginally less rigorous if it also involved a goat sacrifice. That’s not a sentence, I know. The more I think about interviews, the more of my composure I lose.
Because here is the thing about interviews: they are incredibly hostile experiences for candidates.
For many people, I wonder if they might be among the most hostile experiences in all of life. In what other normal life experience is a group of people impaneled to assess —– adversarially! —– one’s worthiness with respect to their life’s work? By a jury by design must say “no” far more often than “yes”.
I’m sorry, Alex, but I’m briefly dragging you back into my narrative.
By the time we interviewed Alex in person, we had already implemented a number of countermeasures to unreliable interviews. Alex had been told up front that an in-person interview meant we liked him a lot and there was a very good chance we’d hire him. I walked into the conference room to meet him. He appeared to be physically shaking. You didn’t need to be a psychologist to detect the nervous energy; it radiated from him, visibly, like in an R. Crumb cartoon.
Engineering teams are not infantry squads. They aren’t selected for their ability to perform under unnatural stress. But that’s what most interview processes demand, often —– as is the case with every interview that assesses “confidence” —– explicitly so.
I suspect we’re going to see this more clearly as development gets more specialized. Already many specialties are absurdly difficult to hire for, like software security, machine learning, distributed systems, kernel and embedded programming, and high-speed networking. Smart firms are going to recognize that it’s not hard to staff talented machine learning programmers because the field is intrinsically difficult, but because so few developers are given any opportunity to cultivate an aptitude. They’re screened out by interviews. Confidence bias selects for candidates who are good at interviewing.
There are people who have the social skills to actively listen to someone else’s technical points, to guide a discussion with questions of their own, or to spot opportunities to redirect a tough question back to familiar territory. Those people build impressive resumes. They effortlessly pass “culture fit” tests. And a lot of them can’t code.
Confidence bias excludes candidates who don’t interview well.
For every genuinely competent and effective developer who can ace a tough dev interview, there are many more genuinely competent and effective developers who can’t. While we’re selecting for the ability to control a conversation, we’re missing ability to fix broken 2-phase commits.
You can look at this like a moral problem. Me, I just see money hats. I am on a mission to eradicate the software developer job interview. I hope and expect to be pursuing that mission twenty years from now.
Meanwhile, let’s contain the damage. There are things you can do to make your hiring processes better. I’ve deployed these tactics. I’ve seen them work. They should be industry standard. They aren’t yet. Adopt them and profit.
Warm up your candidates.
If you’re like most teams, the first experience a candidate has with your selection process is an adversarial tech-out phone screen (or, if they’re lucky, the HR call that schedules that phone screen).
This sucks: it demands that a candidate start running the gauntlet without knowing what to expect. Not knowing what to expect makes candidates nervous. That’s a pointless handicap.
At my last firm, we had the “first-call” system. Every serious applicant got, on average, 30-45 minutes of director-level time on the phone before any screening began. The first-call was my responsibility for a year; once I learned how to do them, we carefully selected 2 other people to do them. Your first-call people need to be great at putting people at ease and selling the job. That call opened with anodyne questions about the candidate, followed by an AMA-style Q&A so the candidate understands the role, and concluded with an exhaustive explanation of the selection process and what to expect at each stage.
We worked from the assumption that a candidate’s resume, background, and even their previous experience had no bearing on their ability to perform the difficult and specialized work we did. So on that first-call, we’d gingerly ask the candidate some technical questions to find out how acquainted they were with our field. Many weren’t, at all.
Those candidates got a study guide, a couple of free books, and an open invitation to proceed with the process whenever they were ready. Those $80 in books candidates received had one of the best ROIs of any investment we made anywhere in the business. Some of our best hires couldn’t have happened without us bringing them up to speed.
Interviews suck. We all seem to understand that fact, but not its implications. Our field is choc-a-bloc with advice from employers to candidates on how best to navigate scary interviews. That’s all well and good, but it’s the hiring teams that pay the steepest price for poor, biased, and unreliable selection. It can’t be the candidate’s job to handle irrational processes. Unless you’re playing to lose.
Build work-sample tests.
Instead of asking questions about the kind of work you do, have candidates actually do the work.
Careful. I am not saying candidates should spend a 2-week trial period as a 1099 contractor. That’s a terrible plan: the best candidates won’t do it. But more importantly: it doesn’t work. Unlike a trial period, work sample tests have all three of these characteristics:
they mirror as closely as possible the actual work a candidate will be called on to perform in their job,
they’re standardized, so that every candidate faces the same test,
they generates data and a grade, not a simple pass/fail result.
You can’t do this with a trial period where a candidate gets paid to fix whatever random bugs are in the issue-tracker. Your goal is to collect data you can use for apples-apples comparisons, which means every candidate has to be working on the same problems. You have to design a test, But even the flimsiest work-sample outperforms interviews, so the effort pays dividends immediately. create a scoring rubric, and iterate.
Here’s a work-sample test we used: we built an electronic trading system in a single-file Sinatra project. We made its interface a custom binary protocol. We built an extremely rudimentary web interface that drove the protocol. Then we had candidates find flaws in the trading system.
Candidates All this work sounds onerous and time-consuming. Two things: first, candidates seemed to love these tests. Second, the more assessment you do in work samples, the less you do in in-person interviews. When I started recruiting, our process from introduction to decision took over a month. When I finished, we averaged 2 weeks. Candidates expended about the same amount of effort, but now half of it was done from their home instead of our office. need to code to attack this system, because there’s no off-the-shelf tool that speaks the wacky protocol we invented. They need some insight to see how they can get their hands on the raw protocol messages to reverse. They have to be comfortable diving into a piece of technology they’ve never seen before. They need to put all those attributes together and use them to kill a trading system.
This test is a couple hundred lines of code, written in a few hours. It out-predicts any interview we’ve ever done.
The same kind of process works for pure dev jobs. So, you’re a Rails shop? Take a Rails application you’ve actually built and deployed. Carve out some functional areas from the application: remove the search feature, or the customer order updater. Bundle up the app with all its assets, so that a single “vagrant up” gives a candidate an app that runs. Have them add back the feature you removed.
For the last several years, work-sample were our most important hiring decision factor. We relied on them almost completely, and in doing so, we multiplied the size of our team and retained every single person we hired (I was the first person to leave the company — no fires, no quits —– since taking over recruiting years ago.) . Here’s what I think I learned from doing this:
Don’t have an elite-candidate fast path. This forces you make the tests good, but more importantly, it collects extremely important data: “here’s what our test says about a candidate we’re sure we want to hire”.
Collect objective facts. Unit test coverage is a fact. Algorithmic complexity is a fact. Handling a known corner case is a fact. You might not use all these facts to make decisions when you start. But I have had the experience of going back through 9 months worth of work sample submissions to collect a new fact. It wasn’t fun. Err on the side of digesting as much data as you can.
Have a scoring rubric ready before you decide on a candidate. If you’re scoring candidates differently, you’re missing the point. This feels confining and unnatural at first. So, err on the generous side with your scoring, and down-select with interviews. Iterate until your interviews almost feel pointless. We got there!
Kill tests that don’t predict. You need multiple tests. Keep an eye on which ones you rely on consistently. We had candidates write a fuzz testing tool. That seemed like a great test, because we’d get a discrete blob of code to read. But it turned out that we almost never learned anything from it. Candidates seemed to like writing the code, but we were never in a hurry to read it. We stopped doing that test.
Prep candidates. Avoid gotchas. Your goal is to extract data from candidates. “Cleverness under pressure” is not a valuable data point. We let candidates do this work on their own time, from their own home, whenever they wanted to, and Remember: you want the data even more than the pass/fail result. we provided tech support.
Standardize and discount interviews.
Want to make a team of software professionals hate you? Require them to interview from a script.
When you think about it, making a hiring decision is one of the most empowering things a developer gets to do. It’s the closest they get to controlling the company’s destiny. No surprise then that they get attached to their favorite questions, and to the discretion they’re given by the process.
They’ll get over it.
You need to collect data. You need every candidate to get the same interview. You can’t make that happen if your team improvises the interview.
We designed three face-to-face interviews. Each took the form of an exercise (in our case, “verbally penetration testing” an application) . Each produced a list of facts. The script made room for back-and-forth with the interviewer, required candidates to puzzle through a problem and ask questions of the interviewer. We made room at the end of each interview for free-form Q&A, but for the bulk of it, the interviewer recited a script, answered questions as the candidate drew things on a whiteboard, and wrote down results.
Interviewers hate it. Thomas’s First Law Of Interviewing: If interviewers enjoy it, you’re doing something wrong. But we kept doing it, because we found that we were comfortable with “why” we were making hire/no-hire decisions when we had facts to look at. We could look at the results of an interview and compare them to the same data generated by successful team members.
More often than not, when the interviewer’s gut-check “yes/no” departed from the data, we’d find that the untrustworthy source was the interviewer. Despite training them not to, they routinely factored “confidence” into their result. Biases like that are pernicious!
You should also consider eliminating phone screens. We didn’t, but we did the next best thing: I simply began disregarding all but the most notably bad phone screen results. You could get selected out of our process by being an asshole on a phone screen, but there was little else you could do to fail.
Phone screens carry almost every liability of interviews but further hamstring the candidate and the interviewer by occurring at a distance, being hastily scheduled, and, for the hiring team, having implicitly lower stakes than the in-person interview. After improving the rest of our selection process, I asked myself, when would I be comfortable denying a candidate the opportunity to further demonstrate their ability based on the outcome of a phone call? The answer was “virtually never”.
Ask yourself some questions about your hiring process.
Is it consistent? Does every candidate get the same interview?
Does it correct for hostility and pressure? Does it factor in “confidence” and “passion”? If so, are you sure you want to do that?
Does it generate data beyond hire/no-hire? How are you collecting and comparing it? Are you learning both from your successes and failures?
Does it make candidates demonstrate the work you do? Imagine your candidate is a “natural” who has never done the work, but has a preternatural aptitude for it. Could you hire that person, or would you miss them because they have the wrong kind of Github profile?
Are candidates prepped fully? Does the onus for showing the candidate in their best light fall on you, or the candidate?
We all compete in the hottest market for talent we’ve ever seen. The best teams will come up with good answers for these questions, and use those answers to beat the market.
Thanks, so much, to Patrick, Erin, Julia, Nate, Dylan, more different Nate, and reviewer X for reading this. It was so much worse before they did. You’ll never buy a drink in my town when I’m around again!