Quickly gain an understanding of how modern AI systems fail, and how causality can help
Modern AI systems have made it easy to tackle many problems previously thought out of reach of computers. You have possibly heard of some of these successes such as:
- GPT-3: Generates paragraphs of human-like text based upon any initial prompt you provide it.
- AlphaFold: Predicts how proteins take shape in 3D space. A true breakthrough in modern biology.
- DALLE-2: Creates incredibly detailed and realistic images from text descriptions.
These systems are so good that they have convinced even those working on developing them that they are sentient.
However, despite the successes, many of these systems can be thought of as technological parrots. Parrots can mimic their owners, but do not have a true awareness of what they are saying, nor why they are saying it.
Similarly, modern AI systems can mimic the patterns they have learnt from previous data, without having the true context of the problem which is being solved, nor understanding why a given prediction is returned. Modern AI systems are parrots at both massive scale, GPT-3 was trained on approximately 3 billion web pages, and with huge societal implications.
The end result of this parroting is that modern AI systems suffer from the following three B’s:
These three B’s mean that modern AI systems are flawed at tackling the nuanced, complex and high risk applications which they are being applied to. Let’s explore how a causal approach can help.
Modern AI systems are blind to the type of relationship between data points and lack context on the problems which they are being used to solve.
To illustrate this consider the relationship between years of experience and income. Typically, someone’s experience is correlated with their income: more experience trends with a higher salary. This is also true in reverse: a higher income trends with more experience. You can call this two-way correlation an associational relationship.
The other type of relationship is a causal one. In this case, one variable causes the change in another. The income someone earns is because of their years of experience*. Unlike the associational relationship, causality is one-way; an individual’s experience isn’t caused by the income they earn.
Causal techniques provide you with the tools to separate association from causation. By intervening on the system and setting someone’s experience to a given value you can observe how this would change their income. Using interventions you can determine the type of relationship between experience and income (causal or association), and in which direction it flows (experience causes income). You can think of interventions as a way of answering certain types of “what if” questions: What if I was 45, instead of 31, how much would I earn?
Modern AI systems are very good at identifying associations in the data and these relationships are fundamental to their success. However, because these systems have traditionally been blind to causality, they have repeatedly learnt misleading associations from the data. These misleading associations, or spurious correlations, can be pernicious and dangerously harmful to AI systems.
Intuitively, a correlation is spurious when we do not expect it to hold in the future in the same way it held in the past. You can find a great list of spurious correlations here. The elimination of spurious correlations is the basis of the randomised controlled trial; the scientific gold standard for proving a hypothesis.
Causal AI is powerful because it allows you to identify and eliminate spurious correlations using the existing observed data- without the need to run a controlled trial.
Spurious correlations are everywhere and are regularly learnt by modern AI systems. These correlations frequently introduce harmful bias, as evidenced by the examples below:
To illustrate how causal techniques can help, let’s extend the income prediction example considered before, by adding a number of other variables which are shown in Table 1.
Due to historical biases within the observed data shown in Table 1, AI systems trained on it learn to associate the female sex with lower income. In order to ensure your model generates useful and safe predictions this bias needs to be controlled for.
Causal techniques allow the creation of a causal diagram which shows the relationships between the variables. Each arrow within this diagram demonstrates how one variable causally impacts another, e.g. experience has a causal effect on income. This allows you to explicitly represent the biases within the data.
Once you have a causal diagram which you believe accurately represents how the data are related to one another it can be manipulated to control for a range of different factors- including elimination of bias.
One manipulation could be an intervention on the sex of the German female engineer to see how that would impact their income. Alternatively, by controlling for sex, you can remove the influence of sex from the causal diagram. The result is an unbiased estimate of the effect of the other factors on income.
Modern AI systems are delicate systems, requiring close fine tuning to ensure they are configured correctly. Despite being trained on vast amounts of data, they can still fail in ways which are surprising or trivial from a human perspective. Figure 4, shows how an image processing algorithm fails to recognise a cow when it is on the beach, as opposed to in a field. This is despite the image classifier having been shown thousands of images of cows during its training.
For the types of modern AI systems referred to in this blog the ability to reliably predict on unseen and unfamiliar data is commonly referred to as generalisation. Causal machine learning approaches generalisation differently, as now both the observed data and the corresponding causal diagram are considered- see Figure 5 below.
Therefore, causal models attempt to generalise from behaviour under one set of conditions, to behaviour under another set. Causal models should be selected based upon criteria which test their stability to changing conditions, e.g. when interventions are performed. Scientists follow this mantra when performing controlled trials to identify causal relationships.
The result is that causal models are more robust to changing conditions in the real world and can adapt more rapidly to dramatic shifts in the data. These advantages have led AI researchers to begin embedding these notions of generalisation, taken from causal AI, into the systems they are building.
This was a brief introduction to causal AI, discussing some of the advantages it brings and how these can help to overcome the blind, bias, and brittle nature of modern AI algorithms.
Here are the three key takeaways:
- Causal Diagrams: By specifying the relationships between the observed data you can get a better understanding of the problem domain, reduce bias, and perform manipulations on the diagram to simulate a broad range of situations.
- Manipulation: Scenarios can be modelled and what-if questions can be answered by manipulating causal AI. These manipulations allow for deeper exploration of the problem, and provide the ability to answer questions about hypothetical situations.
- Generalisation: Causal AI generalises better to unseen data as it is built to adapt to changing environments, and not solely changing data.
*There is clearly more which goes into determining someone’s income level. These would be built out in a more complete model, see the section on “Bias”, but to keep things simple initially we’ll only consider years of experience as impacting income, with the remaining factors kept as hidden variables.