# Please be more careful when interpreting the SO Developer Survey

These types of surveys are interesting and useful, but each year I find myself pulling my hair out at poor analyses by the press and internal analysts. As an example:

The analysis of the Evaluating Competence question:

We asked respondents to evaluate their own competence, for the specific work they do and years of experience they have, and almost 70% of respondents say they are above average while less than 10% think they are below average. This is statistically unlikely with a sample of over 70,000 developers who answered this question, to put it mildly.

Is seriously flawed, and represents a misunderstanding of what "statistically likely" means.

First of all, there are no inferential statistics computed here, only summary statistics. Implicit in this analysis is a comparison between the distribution of competence in the population and a distribution of competence in the sample. You cannot say whether the difference between your sample distribution and the population is "statistically likely" or not without inferential statistics.

If you did run an analysis using inferential statistics, you could make a statement about how likely it is that a distribution from a random sample of the population would have the characteristics that this sample does. You would not be able to draw a conclusion about whether respondents are biased in their evaluation of their own competence, or whether your sample was biased. Because of your methodology, we must assume a biased sample. Inferential statistics of SO survey data have minimal value in this context (comparing distributions to the population of developers) because respondents were not sampled using random sampling methods.

This is both a simple and crucial principle that, apparently, we don't hammer on enough in introductory statistics courses. Everyone seems to be able to parrot "correlation isn't causation", but equally important: you cannot generalize from a non-random sample!

### Sample size doesn't save a biased sample:

Consider the case of the Literary Digest Election Poll of Landon vs. Roosevelt. A huge sample (2.4 million people) was used to generalize to the electorate at large, and predicted Landon would win, with 57% of the vote. In fact, Roosevelt won in a landslide, by 61%. A much smaller sample (50,000) by Gallup used sampling methods that allowed for generalization, and correctly predicted the Roosevelt landslide.

The challenges to generalization and inference here are the same challenges the 1936 literary digest poll had -- selection and non response bias. 70,000 is a lot, but you cannot generalize from a non-random sample, even a big one. Consider that, with over 20 million developers globally, you would need about 1 million respondents to have the same proportion of the population of interest as the Literary Digest sample. And we know how that turned out.