Blog

Improving the Impact of Social Programs Through Better Evaluation

Mar 05, 2024

J.B. Wogan

Emilie Bagby, Douglas Besharov, Melissa Chiappetta, Paul Decker, Anu Rangarajan, Michelle Sager, Michael Woolcock, and Alix Zwane

The latest episode of Mathematica’s On the Evidence podcast explores the idea that a more comprehensive approach to evaluation, including study of a program’s design and implementation, maximizes a program’s chances of success. The episode draws from a convening in January at Mathematica’s D.C. office at which experts from the federal government, philanthropy, academia, and other research institutions discussed insights from the new Oxford Handbook on Program Design and Implementation Evaluation, edited by Anu Rangarajan, a senior fellow at Mathematica.

At the event, Rangarajan said the book is meant to bridge “a troubling division in the field, where much of the emphasis is around impact evaluations, potentially missing learnings from a lot of other types of evaluations.” Throughout the life cycle of a program, researchers can use a range of methods to understand how a program was designed, piloted, implemented, and scaled up. Impact evaluations are only one tool at a researcher’s disposal, but these other evaluation methods “are often underutilized,” Rangarajan said. Combining various evaluation methods might improve the quality of the research and the usefulness of the information being provided to the organizations funding and implementing social programs.

The episode features Rangarajan as well as seven other speakers who expand upon the idea that a more comprehensive approach to evaluation could provide more useful information about whether a program is working, and if not, how it could be tweaked to work. The following speakers also appear in the episode:

Emilie Bagby, director, international education, Mathematica
Douglas J. Besharov, professor, University of Maryland School of Public Policy
Melissa Chiappetta, senior education advisor, Latin America and the Caribbean, U.S. Agency for International Development
Paul Decker, president and chief executive officer, Mathematica
Michelle Sager, managing director for strategic issues, U.S. Government Accountability Office
Michael Woolcock, lead social scientist, Development Research Group, World Bank
Alix Zwane, chief executive officer, Global Innovation Fund

Listen to the full episode.

On the Evidence · 115 | Improving the Impact of Social Programs Through Better Evaluation

View transcript

[MICHAEL WOOLCOCK]

The quest for smart people is to figure out what works and to then imagine pretty quickly, I think, that they have literally discovered something that really does work everywhere on the basis of their rigorous impact evaluation […] those kinds of emphases on just trying to search for an apex for a single best practice, what-works solution leaves unanswered many questions, such as why do many social programs have such a poor track record? What’s going wrong? Is this just bad design? It could be that, but it could also be poor implementation. It could be a context that’s not conducive to what’s going on. It could be an underlying theory of change that itself misspecifies or sets unrealistic or unreasonable expectations about what a certain intervention can achieve.

[J.B. WOGAN]

I’m J.B. Wogan from Mathematica and welcome back to On the Evidence.

Earlier this year, Mathematica hosted a forum in its D.C. office about the role that evaluation should play in improving social programs. The impetus for the forum, which was cosponsored by the Association for Public Policy Analysis and Management, was the new handbook from Oxford University Press, conceived of and edited by Anu Rangarajan, a senior fellow at Mathematica who helped launch the international research division now known as Mathematica Global.

The forum distilled key ideas from the book’s 29 chapters into a 90-minute event with eight speakers. Many of those presentations could be their own Ted Talks. If you like this episode, I encourage you visit mathematica.org, where we have a recording of the entire forum. What I’m going to try to do here is distill even further, providing a highlight reel of sorts, with some necessary context interspersed.

OK, with that preamble out of the way, let’s start with Anu Rangarajan. In 2015, she was at a meeting between Oxford University Press and fellow academics when someone raised the idea of writing a book about impact evaluations of training programs.

[ANU RANGARAJAN]

That sounded like a really good idea and I suggested, “Why don’t you also include the findings and lessons from process and implementation studies,” to which the response was, “Well, I’m an impact evaluation expert. I don’t know much about process and implementation studies.” So that highlighted to me a troubling division in the field, where much of the emphasis is around impact evaluations and potentially missing learnings from a lot of other types of evaluations.

[J.B. WOGAN]

Impact evaluations are just one kind of evaluation in policy research. Other studies can reveal important information about how a program was designed to work and how it was implemented. While an impact evaluation can generate a conclusion about whether a given program produced the intended results, it alone can’t tell you why it worked, or failed to work, and whether it would work in other contexts.

A couple years after the meeting, Anu came across an article that said the vast majority of social programs that undergo evaluation fail to show impacts. As someone who had spent the better part of three decades conducting exactly these types of impact evaluations, first on domestic welfare programs in the United States and later on development programs abroad, Anu had firsthand experience with this problem.

[ANU RANGARAJAN]

And it got me to thinking, why do most social programs underperform; and what can we do to improve the chances of a program showing success?

[J.B. WOGAN]

Anu’s intuition was that improving the chances of a program showing success would mean evaluators tapping into the wider array of research and using insights from other forms of evaluation to iterate and improve programs. She pitched the idea of a handbook to Doug Besharov, an editor at the Oxford University Press, and got the greenlight to begin gathering submissions. Paul Decker, the president and chief executive officer at Mathematica, remembers when Anu first told him about the book idea, bringing together voices from program design and evaluation to highlight how both worlds have evolved to meet changing needs.

[PAUL DECKER]

I was struck by the simplicity of the concept and the complexity of connecting these particular dots.

[J.B. WOGAN]

But that’s good, Paul said, because today’s global problems are themselves, complex.

[PAUL DECKER]

For evaluation and evidence to have the kind of impact it should on programs and on policies and on people, it requires bringing different perspectives, different approaches, and different methodologies to the table.

[J.B. WOGAN]

What’s needed, Paul said, are modern, borderless solutions.

[PAUL DECKER]

That’s as true in how programs are designed, funded, and implemented, as it is in how evaluators are taught, how evaluation teams are formed, and how technology is brought to the table to amplify what’s possible through evaluation.

[J.B. WOGAN]

The overarching thesis of Anu’s book, that appropriate evaluation approaches must be used throughout a program's life cycle, challenges the primacy of large-scale randomized control trial impact evaluations as the best and most important form of policy analysis. It may take time for evaluators to embrace the more inclusive and nuanced approach she envisions.

[MICHAEL WOOLCOCK]

Paradoxically, the more elite your training, and the more fancy the university you to go, the more you in fact find yourself caught up in contributing to a what-works kind of agenda.

[J.B. WOGAN]

That’s Michael Woolcock, one of the more than 60 contributors to the book Anu edited. He is the lead social scientist in the World Bank’s Development Research Group. Under a what-works kind of agenda, he says,

[MICHAEL WOOLCOCK]

The quest for smart people is to figure out what works and to then imagine pretty quickly, I think, that they have literally discovered something that really does work everywhere on the basis of their rigorous impact evaluation and also quickly crown this with the name of a best practice and then travel the world talking about what they’ve discovered regarding what works.

[J.B. WOGAN]

Some solutions really do work across all contexts, Michael is quick to point out. Cochlear implants are an example of an intervention with broad applicability. The implants are small electronic devices that help deaf people to speak.

[MICHAEL WOOLCOCK]

But there’s a lot we do where those technical aspects are just one small aspect of a much bigger array of factors that have to work together. To answer those kinds of questions, we have to have a much richer understanding of how things work and why they work. A standard impact evaluation, no matter how rigorous the methodology that underpins it, is largely about trying to find average treatment effects and largely looking at outcomes and impacts rather than the variation in those outcomes and then where, how, why, and for whom those particular variations might exist. So those kinds of emphases on just trying to search for an apex for a single best practice, what-works solution leaves unanswered many questions, such as why do many social programs have such a poor track record? What’s going wrong? Is this just bad design? It could be that, but it could also be poor implementation. It could be a context that’s not conducive to what’s going on. It could be an underlying theory of change that itself misspecifies or sets unrealistic or unreasonable expectations about what a certain intervention can achieve.

[J.B. WOGAN]

Another issue with many traditional impact evaluations, Michael says, is the retrospective nature of them. By the time they offer a judgement about a program’s effectiveness, it’s too late.

[MICHAEL WOOLCOCK]

It also doesn't answer these questions around what we can do to maximize a project’s chance of success. What do we do mid-course when we sense that things are okay and maybe they could get better, or are we heading for a trainwreck? How do we avoid that?

[J.B. WOGAN]

After an impact evaluation, funders want to know if an effective program can be expanded and replicated elsewhere. Or, if the evaluation did not show impacts, was that because the program can’t work or something about the specific circumstances of this program in this place made it fail? Without careful study of how a program was designed and implemented, you probably won’t know.

[MICHAEL WOOLCOCK]

Much of what we’re trying to convey in the handbook as a whole is that policies are as good as their implementation.

[J.B. WOGAN]

And when a program isn’t implemented properly, an evaluation that finds no impact can mean a potentially effective program won’t be scaled up or continued.

[ALIX ZWANE]

As funders, we need to be thinking about what incentives we are creating.

[J.B. WOGAN]

That’s Alix Zwane, Chief Executive Officer of the Global Innovation Fund, another speaker at the forum. She told a story about a program in rural Bangladesh designed to combat seasonal hunger that underwent a randomized control trial, funded by her organization. The study did not show the desired impacts, but it yielded some insights as to why there were no impacts, and there was good reason to believe that a local partner had provided assistance to the wrong target population. With a tweak in implementation, the program might still work (and in fact, in smaller scale pilots that hadn’t suffered the mistargeting issue, it had shown promising evidence of effectiveness).

[ALIX ZWANE]

The NGO working on this program was indeed public and transparent about the negative findings. However, they framed this not as a shortcoming in targeting and implementation but as a definitively negative result. The program did not work and should be stopped.

[J.B. WOGAN]

The experience made Alix reflect on the dangers of letting the binary results of an impact evaluation determine the fate of an innovative program.

[ALIX ZWANE]

Now, look, all of us – I’ll include myself completely in this – have said part of why we do impact evaluations is because we want to spend more money where things work and less money where things don’t. But that prescription doesn't mean we should be so risk-averse that we are unable to tolerate inevitable hiccups that happen in the messy, real world. What does it mean for a communication or branding that it’s better to declare failure than to admit a misstep in implementation?

[J.B. WOGAN]

Evaluation even has a role to play before a program is implemented.

[MELISSA CHIAPPETTA]

We don’t look at the system as a whole.

[J.B. WOGAN]

That’s Melissa Chiappetta, a Senior Education Advisor for the Latin American and Caribbean Region at USAID.

[MELISSA CHIAPPETTA]

We see this at USAID especially because we’re divided into sectors. And so we’re looking in silos at education and agriculture and nutrition; whereas, that’s not the way the world works, right? So when I’m looking at education, I can’t just look at what’s happening to kids in schools. I also need to look at what’s happening to them in their households.
[J.B. WOGAN]

At the forum, Melissa shared a story about an education activity that USAID was implementing in Malawi. The intervention had three parts.

[MELISSA CHIAPPETTA]

So we were training teachers, we were delivering books, and we were also working with parents to ensure that they were supporting their students in the classroom. What we found is that there was very little impact. This was a model that’s been used in other places, right, over time, but yet it just wasn’t working. We kept using it for years and years, and it seemed as though it should be working because it was working in other contexts.

[J.B. WOGAN]

As USAID evaluated the program, Melissa noticed that the research wasn’t taking into account a common feature of successful education interventions: classroom size.

[MELISSA CHIAPPETTA]

We weren’t looking at how many kids were in a classroom. If you work in the field of education, you know that the number of kids to a classroom actually doesn't matter that much up to 40, right? Once we go past 40, it can start to make a difference. So when I was observing classrooms in Malawi, what I found is 250 kids in one classroom sitting under a tree, outside in the shade of the tree with no fragment of spare shade space for anybody else to sit next to them. I saw about three textbooks for that classroom of 250 kids. I saw about three textbooks for that classroom of 250 kids.

[J.B. WOGAN]

Research of social programs like the one in Malawi could start with a comprehensive needs assessment or formative evaluation, which would precede the program’s design and implementation. In this case, a needs assessment might have identified the problems with student-to-teacher ratios and insufficient learning materials, and it might have even caught other factors outside of the classroom that were impeding student learning, Melissa said.

[MELISSA CHIAPPETTA]

So we kept saying, “Why is this not working? Why is this not working?” I said, “Let’s look at the elephant in the room. We’re not addressing the fact that these kids don’t have the space they need; they don’t have the individualized support that they need; they don’t have the books that they need; and also, many of them are coming to school undernourished. They don’t have the nourishment that they need at home, and they don’t have the health care that they need. So we really have to look at that system as a whole and when we’re attacking these problems, start to address that in program design.

[J.B. WOGAN]

Several speakers at the forum pointed to rapid-cycle evaluation as a promising way to bring together different methods of evaluation, to learn while doing, and support a process of continuous improvement. Emilie Bagby, the Director of Global Education for Mathematica, explained what the term rapid-cycle evaluation, or RCE for short, means, and why it can be a valuable approach.

[EMILIE BAGBY]

Rapid-cycle evaluation is really broad, and it can help facilitate learning at very different moments in the program lifecycle. It takes a systematic approach to learning, and it’s really best when it is iterative. “Rapid” can have a range of meanings. It can go from two days, a couple of weeks, maybe even a whole school year, depending on what you’re looking at. “Cycle” can reflect repeating the same approach multiple times or taking different approaches in a sequential basis to facilitate learning along the way. “Evaluation” is a really broad term, as we all know, and incorporates a full suite of research methods.

At early stages of implementation of program design, rapid-cycle evaluation can be used to help identify operational choices. During program implementation, if a program does or does not appear to be achieving the desired rough results, you can use it to make a proof point or test a new method, a new approach to what you’re doing. To test short-term program effectiveness when taking a program to scale or when seeking to adapt successful implementation efforts in a different context, we can use RCE. Any tested modifications should be feasible to implement at program scale, and decision-makers themselves should be willing to adapt their programming and the choices that they’re making along the way based on the study. The learnings, therefore, must be accessible and relevant to decision-makers.
[J.B. WOGAN]

Melissa Chiappetta at USAID noted that rapid-cycle evaluation could be useful for testing individual components of an overall program before scaling the program.

[MELISSA CHIAPPETTA]

So we like to understand, does this particular coaching activity work? Rather than, does this combination of coaching/teacher training/book delivery and all of things work? Because we don’t always package everything together, right? So breaking down these individual questions into more short-term, quick questions and turnarounds through things like rapid-cycle evaluation makes a ton of sense.

[J.B. WOGAN]

Doug Besharov, who is co-editor-in-chief of the Oxford University Press Library on International Social Policy, noted that rapid-cycle evaluation also has the benefit of providing more immediate and impactful findings for current policy debates.

[DOUG BESHAROV]

Ten- and 12-year impact evaluations, just whatever they’re good for – and they’re good for a lot of things – they’re not great on the Hill. The world changes in those 12 years.

[J.B. WOGAN]

Producing studies in shorter timeframes, Doug said,

[DOUG BESHAROV]

that’s really important to be relevant to the policy argument.

[J.B. WOGAN]

Several speakers at the forum emphasized that even the most rigorous evaluation approach requires a thoughtful communication strategy in order for the findings to influence decisionmakers and scale up effective programs that can improve people’s well-being. Even in the research phase, researchers can take steps to make the reports more relevant to their intended audiences in the policy world.

[MELISSA CHIAPPETTA]

We don’t take the time to make sure recommendations are feasible and actionable.

[J.B. WOGAN]

That’s USAID’s Melissa Chiappetta again.

[MELISSA CHIAPPETTA]

So this really means that we have to talk to the people who are going to be using the recommendations; working with, in the case of education, ministries of education and bringing them into the evaluation early on; making sure that the questions are the types of questions that are actually going to support them and that they’re bought into the evaluation over time.

[J.B. WOGAN]

Melissa also observed that final written products from evaluations tend to place less emphasis on helping readers understand what steps they should take based on the research findings.

[MELISSA CHIAPPETTA]

Oftentimes when I read reports, it’s clear that there was a lot of time spent on findings and some time spent on conclusions; and then the recommendations were done in one day, right?

[J.B. WOGAN]

Melissa said authors could do more to explain in concrete terms what their intended readers should do, based on the evidence.

[MELISSA CHIAPPETTA]

The recommendations say things like, “The Ministry of Education should consider exploring how such and such works,” as opposed to, “The Ministry of Education should consider implementing X policy that has worked in X location that has the same context as they do and has these proven outcomes.”

[J.B. WOGAN]

Others at the forum underscored the importance of informing people about the research findings after reports publish.

[ALIX ZWANE]

Getting evidence into the zeitgeist may be just as important as funding “Scaleup,” capital S.

[J.B. WOGAN]

That’s Alix Zwane again, the Chief Executive Officer of the Global Innovation Fund. Alix shared a story from her time at the Bill & Melinda Gates Foundation where they supported the evaluation of a nonprofit that used subsidies and behavior change messaging to improve sanitation practices among low-income households in rural areas. At the time of the study, the conventional wisdom was that subsidies could not influence sanitation practices. But the study showed otherwise. Subsidies worked better than behavior change messaging alone. Here’s Alix again.

[ALIX ZWANE]

Okay, that’s super helpful, should be part of grist for the mill. But for a variety of reasons, neither the researchers nor the foundation nor WaterAid, who was the implementer of that, really pushed this in any kind of advocacy way. It sort of was, okay, one of those cool papers, it was in Science, that now sort of.... But fast forward to 2020, all the major implementors have announced there’s a consensus that smart subsidies in sanitation work. It somehow became part of the ether and the zeitgeist, and none of us can sort of take credit. So, there’s nobody to attribute that to. We all contributed in various ways. But I think all of us need to be a little bit humble about what really is the causal chain about how we sort of move people’s minds about what “works.”
[J.B. WOGAN]

The experience led Alix to believe that when the research shows something works, more resources should go toward policy advocacy.

[MICHELLE SAGER]

Communication is really important.

[J.B. WOGAN]

That’s Michelle Sager, the Managing Director for Strategic Issues at the U.S. Government Accountability Office. She also spoke at the forum.

[MICHELLE SAGER]

You may have heard others use the term “evidence ecosystem.” As part of that evidence ecosystem whether you are the researcher, whether you’re from another part of the U.S. Government conducting research, whether you’re an institution such as Mathematica, you can’t just complete an evaluation and then hope that people pay attention to it. You have to communicate those findings and recommendations to people who are in positions to act on them.

[J.B. WOGAN]

If you enjoyed hearing these highlights from January’s event, I would recommend checking out the full video recording, which is available at mathematica.org. I’ll include a link to the video in this episode’s show notes. I’ll also include more information about the new handbook that inspired the event and this podcast.

Last month, we celebrated the five-year anniversary of Mathematica’s On the Evidence podcast. Whether this is the first time you’re hearing us, or you’ve been with us since 2019, thanks for listening. If you’re fan of the show, please leave us a rating and review wherever you listen to podcasts. It helps others discover our show. To catch future episodes, subscribe by visiting us at mathematica.org/ontheevidence.

Show notes

Watch the full video recording from the January convening about improving the impact of social programs through a comprehensive approach to design and evaluation.

Learn more about Mathematica Global, the new name and identity of Mathematica’s international unit.

Listen to a previous episode of Mathematica’s On the Evidence podcast that features two of the speakers from the January event, Emilie Bagby and Melissa Chiappetta, discussing how the U.S. Agency for International Development and the International Rescue Committee are building on an evidence review from Mathematica to help local education leaders implement effective programs and policies in northern Central America that will reduce local violence and crime.

About the Author

J.B. Wogan

Senior Strategic Communications Specialist

View More by this Author

Federal

State and Local

Commercial Health

Foundations

International Aid Organizations

Improving the Impact of Social Programs Through Better Evaluation

Show notes

About the Author

J.B. Wogan

Health

Human Services

Global

Improving the Impact of Social Programs Through Better Evaluation

Show notes

About the Author

More like this from Mathematica