Scaling changes everything. How can we predict its effects?

Yale researchers are exploring how to measure pilots with scale in mind

Policy makers at all levels know what it’s like to watch a scheme that’s have done well at pilot stage fail to succeed when scaled up. What works for 30 children in Loughborough may not work for 1000 in Lagos. Or even London.

That’s not because the ideas were poor in the first place, or because the pilots themselves were poorly run, but because scale affects projects in different ways and as such needs a different evaluative approach.

Scale affects how projects perform in part because the size of a project will change its impact fundamentally, but also because different cultures, geographies and economies will react to the same project in different ways. And scaling projects aimed at kids comes with particular challenges, from getting subjects’ consent to finding your audience.

But there’s no point experimenting if your results aren’t likely to be useful. So how can governments nail evaluations that better predict success at scale?

The next level

Expecting what worked in a small social policy pilot to be replicated at scale without adjustments to the program or its evaluation can be fatal.

Dr Stephanie Talbut, Senior Program Quality Manager at the charity Restless Development, which evaluates schemes around the world, argues that such reworking has to be properly resourced. Making sure there are well trained staff who understand the challenges of scaling up is essential.

If you’re going to evaluate an idea that expands from a pilot of 30 children to a national program of thousands, you need continuity in how you approach your evaluation even though what you are evaluating will change.

Even though the program delivery framework will alter with expansion, the essential factors you are testing for can still be robustly measured. And that takes expertise. “Your indicators are going to change, your results framework is going to expand and you’re going to do so much more with it. That needs to be reflected in your staffing budget,” said Talbut.

A group of academics at Yale University in the US has taken on this challenge. The Yale Research Initiative on Innovation and Scale (YRise) is taking a scientific approach to scaling up programs for governments and other organisations across the globe.

Yrise looks at the difference that a small scale pilot has from a wider scheme and tries to understand what the changes are between the well-managed environment of randomised controlled trials (RCTs) and large-scale implementation.

Ahmed Mushfiq Mobarak, Faculty Director at YRise, says that one example in Early Year provision is the training of Nursery Nurses. “When you scale things up, it starts to affect the market,” he said, “so wages might start falling if you train too many people in the same job.”

So the challenge for YRise is: “How do we develop a research design that can predict what happens if we go from 50 to 500 [nursery nurses] — how might wages adjust? We need to address the gap between the research outcomes that we get and the policy outcomes that we might expect to see at scale.”

YRise focuses on five areas: political economy — the impact a program has on politicians and policymakers; evidence — whether a programs can be scaled with confidence in different places; network effects — the consequences, intended or unintended, on the wider environment in which the program runs; effects on the growth and welfare of the area in which it runs and finally how much demand there is for the program.

Childs’ play?

In early years, some programs are more easily scalable than others.

Mobarak says in terms of scalability, programs which can be delivered through kindergarten, parents or through members of the community as opposed to having an expert come out to deliver the program fare better. However this is not without its difficulties.

“Some scaling questions that might come up are if a delivery organisation knows how to do a program well, if you then move to a setting where, to make it scalable and cost effective, you need to train local people. But that might fundamentally change the program. So that also needs monitoring and researching.”

So it will be important to learn how devolving the delivery from external experts to local providers impacts — positively or negatively — on the effectiveness of a program.

Another key element when it comes to evaluating programs for children is the difficulty of gaining consent from subjects.

Securing parental consent to evaluation as well as to participation in a program is vital, Talbut said, as a five year old cannot give consent themselves. Securing the parent’s buy-in pays dividends, however, as “children are blindingly, horribly honest. You will get amazing data without the bias. They don’t care about pleasing you,” he added.

Talbut has found this particularly challenging when evaluating international programs, where differing languages, cultural norms and education and literacy levels present separate challenges.

Getting parental consent requires resources and careful planning.

“If you’re training or equipping others to deliver your program because it’s gone from pilot to city wide, for example,” Talbut said, “Those who do the delivery need to be fully trained on how to take consent, why you’re taking it and what happens with the data. Consent is about more than making sure a parent has signed a form and those doing the delivery need to have that broader understanding of its role in the success of the program.”

All of this careful procedure has the potential to be undermined when performed at scale.

And both Talbut and Mobarak agree on the importance not just of evaluation but of making sure you have good data to evaluate in the first place.

At scale, one concern Dr Talbut raised was “Smurfette Syndrome”. That’s named for the fact that in movies you often have one, overly-amazing (and unrepresentative) woman who is there to be an answer to questions of gender balance, As Smurfette was the only female Smurf.

The female character may be brilliant and a leader, but when the rest of the cast are male, the overall depiction of women in the film remains unrepresentative. So as Dr Talbut says, ”when you scale up an evaluation you can lose meaningful case studies. So you end up highlighting the one good example –you end up with Smurfettes.”

Being able to scale up from pilot scale to national and international projects that achieve the same success rate is a goal all policy makers should be interested in.

As Yrise continues to examine the science behind making this happen it will be useful for all of us to have answers to Professor Mobarak’s questions that all programs should be first interrogated with “Is it worth it? Is it helpful? Is this the best way to improve children’s outcomes?”

If the answer is no, then a project is not useful at any scale. When the answer is yes however, this is just the beginning of the challenge: how to ensure it is not just improving children’s lives in small pockets, but understanding the science behind doing so at a grand scale. — Emma Burnell 

(Picture credit: Foster and Asher/Deathtothestockphoto)

Discussion

Leave a Reply

to leave a comment.
Master the skills you need for the public service.

Discover inspiring resources, tools and policies.