Should the Randomistas (Continue to) Rule?

Fri, 12/06/2019 - 01:36

0 comments

Affiliation

Georgetown University

Date

Tue, 01/01/2019 - 12:00

Summary

"The questionable claims made about the superiority of RCTs as the 'gold standard' have had a distorting influence on the use of impact evaluations to inform development policymaking."

The years since 2000 have seen a dramatic growth in the use of randomised controlled trials (RCTs) to support evidence-based policymaking in developing countries. In this Center for Global Development (CGD) working paper, Martin Ravallion looks at the unconditional preference for RCTs that he notes "has permeated the popular discourse, with discernable influence in the media, development agencies and donors, as well as among researchers and their employers." While acknowledging the value of RCTs for some purposes, Ravallion here engages with advocates of RCTs (the "randomistas"), asking: How did RCTs become so popular? And is their popularity justified?

RCTs are a type of impact evaluation (IE) in which access to a programme (the "treatment") is randomly assigned to some units, with others randomly set aside as controls. To measure the programme's impact, one then compares mean outcomes for these two samples. Ravallion cites the earliest RCT in the International Initiative for Impact Evaluation (3ie) database as Jamison et al. (1981)'s IE of a World Bank research project on education interventions (textbooks and radio lessons) to improve the math scores of students in Nicaragua. Since that time, one group stands out as sparking the rise of randomistas: the Abdul Latif Jameel Poverty Action Lab (J-PAL). Founded in 2003 (as the Poverty Action Lab) and based in the Department of Economics at the Massachusetts Institute of Technology (MIT), J-PAL reportedly had 927 completed and ongoing RCTs in 81 countries as of December 2018). "On top of its own RCTs, J-PAL has clearly influenced the shift in emphasis in empirical development economics more broadly toward RCTs."

The paper provides an overview of the theory of IE, defining terms like average treatment effect (ATE), spillover effects ("contamination"), instrumental variable (IV), mean-squared error (MSE), ordinary least squares (OLS), intent-to-treat (ITT), Hawthorne effects, principle of equipoise, equivalence trial, marginal treatment effects (MTEs), selective trials, and external validity. Ravallion contrasts RCTs with "observational studies" (OSs), in which the assignment of treatment status is purposive rather than random. Ravallion explains that, "[w]hile some OSs are purely descriptive, others attempt to control for the pre-treatment differences between treated and un-treated units based on what can be observed in data, with the aim of making causal inferences about the impact." An oft-cited reason for the "gold-standard" ranking of RCTs is the supposed unbiasedness of an ideal RCT, defined as one in which the trial's treatment status is also chosen randomly (in addition to drawing random samples from the two populations, one treated and one not), and the only error is due to sampling variability. But Ravallion challenges the notion that RCTs are always preferable - e.g., less biased - than OSs. Figure 2 illustrates a hypothetical case, showing that even a biased OS can be closer to the truth than an (unbiased) RCT.

Even if we agree that an RCT is better at removing bias in a specific setting, Ravallion argues, RCTs are not necessarily the preferred statistical tool when feasible. For instance, "contrary to the claims about clean identification of the mean causal impact using randomized assignment-assumptions and models are often required in practice." One problem is that there can be be interference within the clusters, whereby non-participants in the selected treatment clusters are impacted by the programme. For example, Ravallion et al. (2015) studied the use of an entertaining movie to teach people their rights under India's National Rural Employment Guarantee Act. Access to the movie was randomly assigned across villages, with people free to choose whether to watch it. Some did not, but they might have talked with others who did, and this turned out to be a channel of impact on knowledge. The cluster randomisation had to be combined with a behavioural model of why some people watched the movie (Alik-Lagrange and Ravallion, 2018). Only then could the direct treatment effect (watching the movie) be isolated from the indirect effect (living in a village with access to the movie). In this example, the spillover effects within clusters violated the exclusion restriction.

In addition, although it not inherently unethical to do an RCT (as long as this is deemed to be justified by the expected benefits from new knowledge), there are ethical concerns that arise with RCTs, which Ravallion explores. Beyond issues with obtaining informed consent, minimising risks, and protecting privacy and confidentiality, it clearly can happen that, in an RCT, a programme is assigned to some who do not need it, and withheld from some who do. As the discussion explores: "If evaluators are to take ethical validity seriously then some development RCTs will have to be ruled out as unacceptable given that we are already reasonably confident of the outcomes - that the gain from knowledge is not likely to be large enough to justify the ethically-contestable research..." Ravallion contends that, "If pressed, many randomistas acknowledge the ethical concerns, though they rarely give them more than scant attention. They assume that their RCTs generate benefits that outweigh such concerns. Whether that is true is rarely obvious."

Furthermore, Ravallion points to "a serious risk of distorting the evidence-base for informing development policymaking, given that an insistence on doing RCTs generates selection bias in what gets evaluated." The emphasis on identifying causal impacts using RCTs has deflected attention from potentially illuminating OSs. In short, "[i]f we are really concerned about obtaining unbiased estimates of the impact of the portfolio of development policies it would surely be better to carefully choose (or maybe even randomly choose!) what gets evaluated, and then find the best method for the selected programs, with an RCT as only one option."

Thus, in the final analysis, Ravallion argues that a better alignment of research efforts with policy challenges requires:

Making clear that "scientific" and "rigorous" evidence is not confined to RCTs;
Demanding a clear and well-researched statement of the expected benefits from an RCT, to be weighed against the troubling ethics;
Making explicit the behavioural assumptions underlying randomised evaluations;
Going beyond mean causal impacts to include other parameters of policy interest and better understanding the mechanisms linking interventions to outcomes; and
Viewing RCTs as only one element of a toolkit for addressing the knowledge gaps relevant to the portfolio of development policies.

"Going forward, pressing knowledge gaps should drive the questions asked and how they are answered, not the methodological preferences of some researchers. The gold standard is the best method for the question at hand."

Editor's note: This CGD working paper was originally published on August 16 2018. Click here to access that earlier version [PDF].

Web link

Click here to access the 38-page paper in PDF format.

Source

CGD website, December 6 2019.

Legacy Partners

Should the Randomistas (Continue to) Rule?

Red de La Iniciativa de Comunicación

Soul Beat Africa Network

The Drum Beat Network