• Opinion
  • January 30, 2019
  • 10 minutes
  • 2

Data science and programming for policymakers: a practical introduction

Opinion: Programming can help you make the most of your data

This piece is by Dr James Smith, Head of Engineering, and Dr Padraig Alton, Data Scientist at Apolitical. Dr Smith has 20+ years experience in tech, most recently as Head of Labs at the Open Data Institute. Dr Alton is an astrophysicist specialising in Bayesian statistics.

The article is a summary of an interactive workshop Apolitical ran in January 2019. You can access the recording, slides and interactive guide here.

What is programming?

Programming (or coding) is the method of telling computers what to do. We give them inputs, they produce outputs, and something in the middle decides how to turn one into another. That something in the middle is what we create when we’re programming.

Why do we write code and why is it useful day-to-day?

In short, repeatability, automation and speed.

If you give the machine the same inputs it will produce the same outputs. Because it’s repeatable and predictable we can use it for automating common tasks and also do those tasks very quickly; we don’t have to work out the same thing time and again.

You can do a lot of simple repeatable or automated tasks in a spreadsheet, but by learning a bit of programming you can go a lot further. Your programming journey might start with similarly simple tasks, but it’s an entry into a much larger world.

Why do you need to know this as a policymaker if you’re not programming day-to-day?

This is a question of literacy for the modern world. These days, computers are everywhere. Our cars are computers that can drive. Airplanes are computers that can fly. There are computers in your house, in your pocket.

The internet and the web have fundamentally changed the way we communicate. And understanding a bit of programming, a bit of what’s going on inside the black box, will help you develop an intuition about what machines are doing, what they’re capable of, and what their limitations are.

All of which are relevant if you’re trying to apply and generate policy ideas that are fit for the future.

What about data?

We hear a lot about data these days. Data are simply “facts and statistics collected together for reference and/or analysis”. But that’s a broad definition! Data can take a wide variety of forms. It could be numbers, text, images — all sorts.

Indeed, one of the most important trends in our society right now is the rise of “Datafication”, a phrase coined in 2013 to describe the collection and storage of data from an ever wider variety of aspects of daily life. This data doesn’t simply appear; it’s created. It’s important to understand how it was created, when, and by whom.

There’s data on everything, increasingly all stored electronically, and ever more widely accessible

Buildings have sensors strewn throughout them now, as do cars, creating data on infrastructure usage which can be used to detect failures (or to predict them in advance). Supermarket loyalty schemes have turned our purchasing habits into data. Twitter has even turned our idle thoughts into data!

There’s data on everything, increasingly all stored electronically, and ever more widely accessible.

But what’s all this got to do with public service?

The answer, first-and-foremost, is that policy works best when it’s based on evidence: that is, when you are fairly certain that a particular policy that you are implementing will contribute towards achieving your policy aims. The thing that provides the evidence is data.

Data-driven policy isn’t a new idea! Let’s go all the way back to 1854 and a key moment in evidence-based policy-making, from the public health sphere.

In 1854, there was an outbreak of Cholera in Soho, London. The disease had first arrived in 1831 during a global pandemic and had killed thousands of people. This new, acute outbreak had killed hundreds of people in a matter of days — a serious crisis.

No-one understood how disease spread, and the prevailing theory was that it had something to do with bad vapours, or miasma. However, Doctor John Snow had a different theory, and he set out to investigate.


Picture credit: Wikimedia Commons

Here is his original map, which is really an infographic: the black bars show the number of Cholera cases at different properties in the district. At the epicentre of the outbreak, on Broad Street, a water pump is marked. Snow’s theory was that Cholera was water-borne. The data provided the necessary evidence.

“I had an interview with the Board of Guardians of St. James’s parish, on the evening of Thursday, the 7th September, and represented the above circumstances to them. In consequence of what I said, the handle of the pump was removed on the following day.” — Dr John Snow

John Snow was said not to be a charismatic speaker, but a good data visualisation can be very persuasive. The local government officials removed the pump handle and the outbreak subsided.

It would be a simplification to claim that his theories were widely accepted overnight, or that policy pivoted dramatically. Nevertheless, a year later legislation was passed that paved the way for the construction of London’s Victorian-era sewers — although, in truth, this may have had as much to do with the terrible smell outside parliament as public health per se…

So what’s changed since 1854?

First, the democratisation of programming and the spread of open data mean that these days it isn’t only experts who can gain the insights hidden away in raw data.

Another major change is the rise of “Big Data”. At one level, Big Data just means we’re collecting an awful lot more of it than ever before. But the phrase hints at the incredible insights you can derive from data if you only have enough of it.

Data-at-scale is driving many of the key technological changes in our societies: the automation of decision-making, the rise of smart cities, the development of driverless cars.

How is it doing that?

It’s not just about how much data you have, but also how you analyse it. Programming can help you make the most of your data.

At the intersection of programming and data analysis lie machine learning and artificial intelligence

This isn’t merely about speeding up data analysis. A human analysis of a small, relevant dataset can still yield valuable, life-saving insights — and that will remain the case for the foreseeable future. But at the intersection of programming and data analysis lie machine learning and artificial intelligence.

The automatic production and refinement of data analyses allow for faster, smarter decision making — and better predictions of, and responsiveness to, events. This is revolutionary, and at Apolitical we regularly publish stories on cutting-edge policy applications of machine learning and AI. These are the key technologies driving the societal changes that we’re seeing, and it’s important for public servants to be literate in them.

The immense, and growing, capabilities of AI may seem mysterious. But if you understand how to do simple data analysis using a programming language, you’re more than halfway to de-mystifying how AI works — and as a corollary, understanding what its potential and its limitations are.

So dive in! You can find the recording, slides, and interactive guide here. Apolitical has also compiled a list of free resources and courses to take things further on both data and digital and AI. — James Smith & Padraig Alton

(Picture credit: Unsplash)


Leave a Reply

to leave a comment.
Master the skills you need for the public service.

Discover inspiring resources, tools and policies.