Interview by Richard Marshall.
‘The equations of physics are algebraic and symmetrical, whereas causal relationships are directional. To mathematize causal statements we need a new calculus that (1) reflects this asymmetry and, at the same time, (2) accounts for the symmetries we see in correlated observations.’
‘The smoking-cancer debate may have contributed to the revolution by demonstrating painfully how badly a language for causation is needed. But the first concrete step toward the revolution (not counting Wright) was taken by Don Rubin, in 1974, who adopted Neyman notation for counterfactuals and applied it to observational studies (to predict the effect of a pending intervention).‘
‘I see sparks of uprisings among economists following me on twitter, hoping to return economics to its glorious days as the leading Queen of causal analysis. This however requires that economists accept causal diagrams for a working language, as health scientists have done. Sadly, for economists, this cultural shift seems more traumatic than conversion to voodoo.‘
Judea Pearl is a computer scientist and philosopher, best known for championing the probabilistic approach to artificial intelligence and the development of Bayesian networks . He is also credited for developing a theory of causal and counterfactual inference based on structural models. He is the 2011 winner of the ACM Turing Award, the highest distinction in computer science, “for fundamental contributions to artificial intelligence through the development of a calculus for probabilistic and causal reasoning”. Here he discusses the causal revolution, why stats and big data can’t answer any causal questions, the two languages of the causal revolution, the ladder of causation, classical statistics, Bayesian networks, Hume, Lewis and counterfactual reasoning, freewill, human minds as causal not statistical machines, how this changes the way data is used, mediation, and why we should note that we are smarter than our data.
3:AM: What made you become, among other things, a philosopher?
Judea Pearl: Philosophers do not consider me one of them. Perhaps because I have degrees in Engineering and Physics or because I show no interest in digging into the irrelevant writings of ancient philosophers. My interest in philosophy was sparked by my highshool teachers who insisted on teaching us about the lives of Pythagoras, Socrates, Archimedes, Epicurus and Diogenes. I recall that, unlike most of my class mates, I was deeply concerned with the foundational issues: what do we mean by an electric field, how do we know that 1 Ampere measured one way is the same 1 Ampere measured differently. etc.
I later read quite a few books in the philosophy of science, from Reichenbach to Nelson Goodman, from Popper to Toulmin and A J Ayer. Through their influence, I went back and read Locke and Hume.But I could not stand Hegel and Kant.
3:AM: You’ve written about the ‘Causal Revolution’ as a new science that is revolutionising science and you think will bring about amazing results in years to come. It was an approach that surfaced at about the same time that statistics emerged – but unlike stats it never took off until about thirty years ago. Firstly, then, can you sketch for us the language gap in science – the failure to discuss causes – that meant that up until very recently we were unable to answer ‘why’ questions adequately?
JP: Correct, and I am sure philosophers will find this statement hard to swallow, as do many statisticians. But when I ask them to write down a mathematical expression for the sentence: “The rooster crow does not cause the sun to rise” or “The barometer falling does not bring rain” the swallowing becomes easier. The equations of physics are algebraic and symmetrical, whereas causal relationships are directional. To mathematize causal statements we need a new calculus that (1) reflects this asymmetry and, at the same time, (2) accounts for the symmetries we see in correlated observations. Without this calculus we cannot even represent the question “Why,” let alone answering it.
3:AM: We’re pretty familiar with statistics and data driven answers – the bookstands are fully of ‘big data’ books and the like – but is it your view that this is just a symptom of the causal vocabulary prohibition and that although we know not to confuse correlation with causality we’ve never been clear just what causality is?
JP: We have never internalized how profound this dichotomy is, which Nancy Cartwright immortalized in: “No causes in no causes out.” This means that statistics and big-data cannot answer ANY causal question, for example, what will happen if we intervene (say ban cigarettes) or what would have happened had we acted differently. The latter being counterfactual. The astonishing success of big-data and machine learning reflects our under-estimating how much can be achieved by the low hanging fruits of model-free curve-fitting. But when we look at the limitations unveiled by the calculus of causation we understand that human-level AI requires two more layers: intervention and counterfactuals.
3:AM: The Causal Revolution has two languages – can you sketch them for us – one’s a diagram and the other a symbolic language. Why two and not one?
JP: When you ask a scientist to provide you with knowledge about a domain, say medicine or social interactions, it is usually qualitative and declarative. For example, vitamin C cures scurvy but has no effect on heart failure. On the other hand, the questions that scientists normally wish to answer are quantitative and hypothetical, for example, what increase in unemployment can we expect if we were to raise minimum wage by one dollar an hour.
Thus, scientific knowledge is preferably encoded in causal diagrams and scientific questions are asked in the language of counterfactuals. Both are derivatives of Structural Causal Models (SCM) which provide the formal semantics for counterfactuals and the calculus of causation.
3:AM: . The intervention operator is very important isn’t it – how does it at a stroke remove paradoxes brought about by confusing ‘seeing’ with ‘doing’ – and how does it give power to thought experiments such as when we think about interventions without enacting them, or counterfactual thinking? I guess what I’m asking about here is your central metaphor – the Ladder of Causation!
JP: The do-operator simply emulates in our model what the actions do(barometer down) changes in the world, which is different from see(barometer down). The latter predicts rain, the former does not. Counterfactual thinking requires an additional step, retrospection, or updating history, and is situated at the highest level of the causal hierarchy, as represented by the Ladder of Causation.
3:AM: Why is all this much better than just classical statistics – and why didn’t the world listen to Sewall Wright? Was it when statisticians were trying to answer the question, ‘Does smoking cause lung cancer’ that the causal revolution got going?
JP: Classical statistics resides on the lowest level of the Ladder, deprived of the power to reason about intervention or retrospection. The world could not listen to Sewall Wright because he spoke in a language that sounded like Swahili to scientists in the 1920-1960 – causal diagrams.
The smoking-cancer debate may have contributed to the revolution by demonstrating painfully how badly a language for causation is needed. But the first concrete step toward the revolution (not counting Wright) was taken by Don Rubin, in 1974, who adopted Neyman notation for counterfactuals and applied it to observational studies (to predict the effect of a pending intervention). Interestingly, this was two years after Stalnaker’s famous letter to David Lewis (1972), in which he proposed a “possible worlds” semantics for actions and counterfactuals. But there was no communication whatsoever between the two communities. More recently (1994) the SCM framework provided semantics to counterfactuals which can be given a “closest-world” interpretation.
3:AM: How do Bayesian networks fit into all this – aren’t they just stats and therefore lacking the causal dimension?
JP: Correct. Formally, Bayesian networks are just efficient evidence-to-hypothesis inference machines. However, in retrospect, their success emanated from their ability to “secretly” represent causal knowledge. In other words, they were almost always constructed with their arrows pointing from causes to effect, thus achieving modularity. It is only due to our current understanding of causality that we can reflect back and speculate on why they were successful; we did not know it then.
3:AM: David Hume discusses counterfactuals and David Lewis says he gave two definitions, one which involved counterfactuals. Was Hume right in placing counterfactual thinking at the heart of causality thinking?
JP: Absolutely. Both Hume and Lewis understood that counterfactuals are more accessible to the mind than “causes,” and even more than “regularities”. It is only now, however, that we understand why — counterfactuals are computable from our model of the world using a simple, 3-step process.
3:AM: One thing that seems written into this approach is freewill, morality and social responsibility. You write: ‘ Counterfactuals are the building blocks of moral behavior as well as scientific thought. The ability to reflect back on one’s past actions and envision alternative scenarios is the basis of free will and social responsibility.’ Does it follow from this that the causal revolution implies we should have certain attitudes and beliefs about morality, social responsibility and freewill, that it implies even metaphysical conclusions?
JP: The causal revolution implies that we now have the tools to represent these cognitive functions on a digital machine, which essentially means that we will be able to understand the neural pathways that support these capabilities. Whether these attitudes will emerge spontaneously in counterfactual-based reasoning systems I can’t speculate, we may need to program them in. The main point is that the causal revolution casts the “free will” enigma as a computational issue. Permit me to quote:
“Granted that free will is [or may be] an illusion, why is it so important to us as humans to have this illusion? Why did evolution labor to endow us with this conception? Gimmick or no gimmick, should we program next generation computers to have this illusion? What for? What computational benefits does it entail?”
3:AM: You go on to write that ‘The algorithmization of counterfactuals invites thinking machines to benefit from this ability and participate in this (until now) uniquely human way of thinking about the world’ and this makes clear how important you think the causal revolution is and will be in the future to the development of Artificial Intelligence.’ Can you say why this is so? What roadblocks to achieving human-level intelligence is solved by causal thinking? And are we to conclude that the human mind is a causal not a statistical inference machine?
JP: There is no question that the human mind is a causal, not a statistical inference machine. How else can we explain why Simpson’s paradox evokes surprise in humanoids? And how else can we explain our uniform understanding of everyday counterfactual statements (e.g., had Cleopatra’s nose been shorter… or had Hillary won the election…) despite their hypothetical nature. Finally, how else can we explain our unsatiated craving for “understanding”, and why raw data, even processed and summarized, do not satisfy this craving.
3:AM: So can we now compute the probability of answers to questions we might find in law, or economics or education, such as ‘ Is man-made climate change a sufficient cause of a heat wave?’
JP: We have the methods of computing those probabilities, which is not to be dismissed lightly. Having a method allows you to zoom in on the right kind of data and the right kind of assumptions that need to be substantiated before spending precious resources on estimating the wrong quantities. For lawyers and policy makers to accept probabilities of counterfactuals as normative evidence for decisions would probably take a generation of training and debates.
It is hard to imagine, however, that they would reject the logical consequences of their own verbal definitions, which are cast in counterfactual terms. We clearly see the dominant role counterfactuals play in the legal definition of liability and hiring discrimination.
3:AM: How does this causal flowchart approach change the way data is used, indeed, when we should and shouldn’t collect data? And why don’t you think, like some in AI argue, and others in other areas using statistics like economics, that we can ignore the causal modelling and just rely on data mining and deep learning?
JP: We covered AI and machine learning in previous questions, where I explained why current methods of model-blind learning are bound to hit the wall of inadequacy as we advance toward human level intelligence. As to economics, I believe the data-centric school of Hendry, Granger and Sims is no longer being pursued seriously, and the potential outcome framework that currently dominates the “experimentalist” school in econometrics is just a temporary over-reaction to decades of confusion by regression analysis. I see sparks of uprisings among economists following me on twitter, hoping to return economics to its glorious days as the leading Queen of causal analysis. This however requires that economists accept causal diagrams for a working language, as health scientists have done. Sadly, for economists, this cultural shift seems more traumatic than conversion to voodoo.
3:AM: What is what you call mediation and why is it an important topic in all this?
JP: Mediation analysis aims to uncover the causal pathways along which changes are transmitted from causes to effects. It has enormous scientific and practical implications. Scientifically, mediation tells us how nature works. Practically, it enables us to predict behavior under a richer variety of interventions. For example, whether swamps cause malaria via “mal air” or mosquito bites makes a big difference in how we can protect ourselves on our next visit to the swamps. If the “mal-air” theory is valid, then breathing masks should replace mosquito nets. The challenge of assessing mediated effect from data was to isolate them despite our inability to disable the direct effects. Fortunately, counterfactual logic enabled us to perform this isolation and gave rise to the “Mediation Formula” which quantifies (based on data) the fraction of the observed effect “owed” to any given mediating path.
3:AM: Why do you see positive things in the fact that we are smarter than our data? Surely there are ‘Age of Ultron’ threats from AI characters who communicate with us in our own language about policies, experiments, explanations, theories, have regrets, take responsibility, have free will, and obligations—and, eventually, make their own moral decisions. Human monsters can do this as well – the fear is that AI monsters are not just smarter than their data, they’re smarter than we are?
JP: The message “smarter than data” that ends our Introduction chapter was not meant to be a call for complacency but, rather, a call for change of strategy. It meant: “Let’s examine the reasons we are smarter than raw data and lets emulate the missing ingredients on digital machines so as to endow them with similar smartness.”
3:AM: And finally, are there five books or articles other than your own, that you can recommend to the readers here at 3:AM that will take us further into your philosophical world?
JP: Three and an article!
Causation, Prediction and Search, Spirtes, Glymour and Schienes (2000)
Halpern “Actual Causality” (MIT, 2016)
W Spohn “The Laws of Belief” Oxford 2012
ABOUT THE INTERVIEWER
Richard Marshall is still biding his time.
First published in 3:AM Magazine: Saturday, September 8th, 2018.