The past twelve months have been discouraging for international efforts to tackle poverty, and Africa is no exception. The African Development Bank revised its economic outlook last year to reflect the changing reality, with “up to 37,5 million additional people to enter extreme poverty in 2020” on the continent. While the importance of development assistance has increased during the ongoing pandemic, its fiscal fallout will increase pressure on donor governments to cut foreign aid budgets.
The UK is a prominent recent example, with the Foreign Office minister resigning after the aid budget for 2021 was cut by a third. Meanwhile, African governments are caught between a rock and a hard place, needing to spend more to combat COVID-19 but with falling revenue due to the need for curfews and the collapse of tourism.
In light of growing fiscal constraints , the identification and pursuit of programs that deliver value for money are bound to receive increased attention from both donors and African governments alike.
Since the turn of the century, randomized controlled trials (henceforth RCTs) have steadily gained popularity in the international development community as “rigorous” and “evidence-based” tools for informing policy and evaluating programs. Organizations such as the Abdul Latif Jameel Poverty Action Lab (J-PAL) and Innovations for Poverty Action (IPA) have been very successful in persuading donor and partner governments throughout sub-Saharan Africa to adopt their approach to collecting evidence and evaluating development programs.
Prominent RCTs implemented in Africa include the study of child deworming on school outcomes in Kenya, fighting corruption to improve schooling in Uganda, and evaluating the impact of voter knowledge initiatives in Sierra Leone, to name a few. Yet they remain controversial, not least because there are serious questions about whether they generate reliable findings that can be reproduced at scale.
Opportunities for evidence-based policymaking
Impact evaluations of development aid projects and programs employing RCTs are often provide reliable evidence with clear implications for policy. Numerous RCTs have influenced national policy in the developing world, resulting in the upscaling of community-based preschool in Mozambique and the provision of commercial bank loans to schools in Pakistan, for example.
A classic RCT works by dividing up areas (districts, towns, provinces, and so on) or individuals into areas/groups that receives a “treatment” – such as greater information about how political leaders operate, or a new set of committees to check on the availability of drugs and doctors in a hospital – and areas/groups where this is not done (the control). By comparing how much things change in the areas/groups with the treatment to the control, researchers can determine how effective it was – and hence whether it is worth investing in.
This offers a more systematic way to evaluate the impact of new policy interventions, and one that promises to be more “scientific” by demonstrating that the desired change did not occur in other similar areas, and so the relationship is one of cause and effect rather than chance correlation.
The limitations of RCTs
While it is true that RCTs have a number of valuable advantages, there are also reasons to be cautious in how they are used, and how their findings are interpreted.
First, RCTs – and the projects they test – can be extremely expensive, which can generate frustration when the research finds that the intervention has not been successful. Although this can be viewed as saving money – because it avoids scaling up projects that would not work – it risks creating the perception that money that could have been spent on development was lost to an experimental process that did not contribute to better policy.
Second, some RCTs have only modest claims to external validity, with their generalizability being limited at best. As an example, if a randomized evaluation in Uganda shows that providing more information about how MPs perform through the use of “scorecards” does not change the way that citizens respond to them, this does not mean it would not work elsewhere. Uganda is an authoritarian context and features a dominant ruling party. It could be the case that scorecards would enhance demands for accountability in more democratic political systems.
Even if baseline individual, societal and political characteristics are accounted for, unobserved factors such as different social norms may mean that implementing the program elsewhere will not lead to the outcomes indicated by the RCT. While donors may present results from randomized evaluations as “hard” evidence supporting the scale-up of a certain government program, policymakers in partner countries should regard the evidence with scrutiny. Particular attention must be paid to the setting in which the RCT was conducted: Results from an evaluation conducted in similar settings are more likely to generalize, but their replicability is still likely to be imperfect.
The external validity problem also includes the so-called “piloting bias”: Pilot projects in developing countries are often implemented by local nongovernmental organizations (NGOs) with significant experience in running and evaluating similar programs. Even if a randomized evaluation concludes that a program has the desired impact, this may not be the case when responsibility for running the project is transferred to the state. The gain from scaling up interventions may therefore be smaller than predicted.
What to look for
Before scaling up a succesful pilot program, public officials should be careful to consider reasons why extending the intervention to a wider set of individuals or areas may change the way that it plays out.
Scaling up development programmes may cause problems as well as generate solutions for a number of reasons. For example, if a pilot program that transferred cash to female villagers chosen at random proves to be succesful at improving important socio-economic indicators, transferring cash to all women may induce general equilibrium effects that were hitherto unaccounted for. Unexpected changes could range from inflation (due to a district-wide increase in consumer demand), a shift in attitudes towards labor, a negative and potentially violent response from men whose status has shifted, and so on. These are not necessarily reasons not to scale up a succesful pilot, but these unintended side effects do need to be thought through and dealt with.
RCTs with a short monitoring period and high specificity to a certain context are relatively more prone to this issue than others.
In addition, many policies are less “amenable” to randomized evaluations: While RCTs may inform policymakers about the effectiveness of nutritional supplements, teacher monitoring and remedial education, interventions such as broader institutional reform and large infrastructural projects tend to be beyond their reach.
While some donors and INGOs continue to advocate for RCT’s, policymakers in developing countries should be aware that solely focusing on evidence based on RCTs will limit their options. Rather than being shelved altogether, questions that cannot be answered by randomization are best approached with other methods of impact evaluation.
RCTs and more broadly, impact evaluations in African countries have mostly been conducted by international organizations and researchers from donor nations. There may be some advantages to this, including the ability to harness external expertise and the fact that external groups may be seen as more objective when it comes to analysing sensitive policy interventions, such as Mexico’s conditional cash transfer programs).
All too often, however, this means that a lot of the funds for RCTs are taken out of the country, while precious little skills and experience are transferred to African researchers so that they can conduct such projects in the future.
This is not always the case: some organizations conducting RCTs such as J-PAL have created country offices in several developing countries, training local researchers to independently conduct randomized evaluations and thus facilitating the subsequent creation of local evidence by members of the local research community. But much more needs to be done in tihs regard.
African governments should demand that all such organisations operate in this way, while also providing training programmes to support domestic researchers and institutions.
Supporting the co-creation of evidence by supporting the engagement of local researchers with international organizations will mean that impact evaluation will not have to be outsourced in future, and that those who run RCTs will have in-depth knowledge of the context in which they are operating. In this manner, policymakers can ensure that RCTs not only produce evidence, but also generate indirect benefits through knowledge spillovers for the local academic and civic community.
Implications for policy
RCTs are an important tool in the policy makers arsenal, then, but their results must be critically assessed with respect to their wider applicability – and it should not be assumed that RCTs are always superior to other methods.
The high cost of some RCTs, and the fact that their novelty is starting to ware off, means that many donors are likely to become more cautious about deploying them in the next decade. This would be a blow for policy relevant research but might also represent an important opportunity by encouraging us to carefully think through which RCTs make most sense, combine RCTs with other forms of research, and recognise that plural methods are likely to generate the most illuminating and reliable results.
Levan Veshapidze studies in the Development Economics Master program at Georg-August-Universität Göttingen. Besides the effectiveness of development aid, his research interests involve the political economy of development cooperation.