Is evidence-based policy facing a crisis?

By Adrian Brown

Adrian Brown from the Centre for Public Impact says policy should be more experiment-based rather than solely evidence-based.

The 2008 financial crisis was, in part, triggered by a financial innovation. Collateralised Debt Obligations (CDOs) made it possible to slice up highly risky, sub-prime loans and repackage them as AAA-rated bonds. When CDOs were combined with leverage (raising further debt by using CDOs as collateral), the problem morphed from a local problem into a systemic flaw that almost brought down the entire global financial system.

The fact that none of the major players – not the regulators, the ratings agencies, the banks or the investors – spotted the fundamental flaw in CDOs suggests a form of collective deception. It turned out that everyone knew far less than they thought they did.

Could it be that the evidence-based policy community is suffering from a similar deception?

Scared Straight

Take Scared Straight, the prison visitation programme in the United States that is perhaps the single most celebrated example of evidence-based policy in action. Scared Straight first came to prominence in 1978 when an Oscar-winning documentary of the same name highlighted a scheme in which young offenders were exposed to the realities of life behind bars. The idea was that by taking part in a short prison visit, in which they were intimidated and verbally abused by inmates, they could be scared into turning away from a life of crime.

Today it is a widely accepted view that such programmes not only fail to reduce reoffending – they actually increase it. A 2012 systematic review, generally considered to be the gold standard in the evidence-based policy community, confirmed the finding from an earlier version of the same review in 2002, “not only does [Scared Straight] fail to deter crime but it actually leads to more offending behaviour”.

From this cue, those seeking to promote evidence-based policy around the world added Scared Straight to the list of policies that should be avoided. For example, the National College of Policing in the UK ranks the evidence against Scared Straight and other juvenile awareness programmes as “very strong”, stating that “programmes which use organised prison visits… lead to more offending behaviour”. Similar statements have been made by evidence-promoting bodies in many other countries.

As a result, prison visitation programmes are strongly discouraged and indeed any evidence-led funder will not support them. Everyone agrees they are clearly both harmful and a waste of money. So far, so straightforward.

What counts as very strong evidence?

But could it really be that simple? It is, on the face of it, a little surprising that any intervention for young offenders that lasts just a few hours would have such a dramatic effect (positive or negative).

The 2012 review is based on a meta-analysis of seven studies, all of which are from the United States and all of which are extremely old, ranging from 1967 to 1982. The studies themselves are very small, generally including just a few dozen participants, and most are unpublished. Only two show a statistically significant effect. Of those two, one (according to the review’s authors) has “dramatic” problems with randomisation and the other is very old indeed (from 1967).

Digging deeper, the unpublished 1967 study was from Michigan and was based on 2 groups of 30 boys. In the test group, 2 boys dropped out so the total was in fact 28, of which 12 committed “delinquent acts” within 6 months. In the control group, only 5 got in trouble. The researchers defined “delinquent acts” broadly, including even minor infringements such as probation violations (a different definition to the other studies).

With such small numbers, even very minor changes can swing the overall result dramatically. For example, if two boys hadn’t dropped out of the test group and everything else had stayed the same then the result would no longer have been significant. Furthermore, as this study includes no information on the nature of the prison visit, it is impossible to know whether it was actually similar to the 1978 Scared Straight intervention (e.g. relying on intimidation and scare tactics) or not.

It was a judgement call by the reviewers to include this study, but if we exclude it (as well as the one with “dramatic” randomisation problems), then the systematic review would shift from finding a significant result overall to no significant result.

So, to zoom back out again, the “very strong” evidence against Scared Straight is based on seven US studies from 1967-1982, only two of which are statistically significant, and both of these have potentially serious flaws. In fact, in a sort of butterfly effect, if just one more boy in the 1967 Michigan control group had got into trouble then the whole edifice collapses.

So, just as CDOs repackaged low-quality mortgages and turned them into high-quality investments, the evidence-based policy community has taken low-quality studies and repackaged them as “very strong” evidence that prison visitation programmes don’t work. Not just for young boys, not just in the United States and not just in the past, but apparently for everyone, everywhere, forever.

Straight outta Cirebon

Why does this matter? Well, I think this is a prime example of overreach, where a rather limited and outdated set of studies from a particular place is used to make a bold statement about what works in general now. The result is that prison visitation programmes are no longer tried under test conditions, so we really don’t know whether they would be better designed or more effective in Cirebon in 2019 than in Detroit in 1968. Perversely then, evidence-based policy is actually preventing us from innovating and collecting any new evidence or insights about what might work.

Other limitations of the evidence-led approach are well documented. For example, in order to collect evidence at all, we need to be able to develop testable hypotheses, which means simplifying complex situations into binary choices. It is much easier to test whether a specific intervention works or not (simple, binary) than the role that family relationships play in reducing offending (complex, non-binary). The latter may be far more important, but the evidence is classed as weaker because it is harder to collect.

All of which highlights an important difference between a focus on evidence and a focus on experimentation. The former encourages us to make definitive statements about “what works”, despite the obvious transferability problems of large swathes of public policy. The latter encourages practitioners to constantly reflect on what they are doing and improve things in situ, sharing what they are learning with others as they go. Our work on the enablement mindset in government is exploring how experimentation can be encouraged.

Clearly, there is a role for evidence in policymaking, just as there is no doubt a role for collateralised debt in finance. The danger comes if we go too far and build too much on the basis of too little.

Adrian Brown is the Executive Director of the Centre for Public Impact, where this post was also published .