This week on my podcast, I read my November 2020 Locus column, Past Performance is Not Indicative of Future Results, a critical piece on machine learning and artificial intelligence that takes aim at the fallacy that improvements to statistical inference will someday produce a conscious, cognitive software construct. It’s a followup to my July 2020 column, Full Employment, which critiques the fear/aspiration of automation-driven unemployment.
The problems of theory-free statistical inference go far beyond hallucinating faces in the snow. Anyone who’s ever taken a basic stats course knows that “correlation isn’t causation.” For example, maybe the reason cops find more crime in Black neighborhoods because they harass Black people more with pretextual stops and searches that give them the basis to unfairly charge them, a process that leads to many unjust guilty pleas because the system is rigged to railroad people into pleading guilty rather than fighting charges.
Understanding that relationship requires “thick description” – an anthropologist’s term for paying close attention to the qualitative experience of the subjects of a data-set. Clifford Geertz’s classic essay of the same name talks about the time he witnessed one of his subjects wink at the other, and he wasn’t able to determine whether it was flirtation, aggression, a tic, or dust in the eye. The only way to find out was to go and talk to both people and uncover the qualitative, internal, uncomputable parts of the experience.
Quantitative disciplines are notorious for incinerating the qualitative elements on the basis that they can’t be subjected to mathematical analysis. What’s left behind is a quantitative residue of dubious value… but at least you can do math with it. It’s the statistical equivalent to looking for your keys under a streetlight because it’s too dark where you dropped them.