The UK’s first census in 1801 was supposedly carried out to count how many people were eligible to fight in the Napoleonic wars. In reality, there were 12 justifications. One of these, that “the intimate knowledge of any country must form the rational basis of legislation and diplomacy,” still holds true today.
“That data obviously underpins so much,” said Professor Andy Tatem of the University of Southampton. “The very first, basic part is to come up with estimates of how many people there are. The country’s GDP, how many representatives it has in parliament, how money gets allocated to different parts of the country, disease metrics – anything the government does has population data underlying it.”
“Rather than doing a census every 10 years, you could potentially do that every year”
Every 10 years since 1801 – with the exception of 1941, in the midst of the Second World War – the UK has counted its population by surveying households. But the next census will be the last. After 2021, the UK will instead harvest the data people leave behind as they live their everyday lives. The US will hold its last survey in 2020. The Scandinavian countries, ever ahead, have already moved on.
Breaking the con-census
The idea is a simple one. Rather than survey citizens, statisticians would collect the data traces left behind by people’s everyday interactions with government. Data is collected from welfare and tax departments, housing and vehicle registrations or our health records. By aggregating all this information and anonymising it to protect citizens’ privacy, statisticians can glean more than they do from asking everyone on paper.
“The benefits that the approach would give you are much more frequent statistics, so rather than doing a census every 10 years, you could potentially do that every year. You could do a range of new things; bring in topics you don’t already ask in the census,” said Becky Tinsley, of the UK’s Office for National Statistics (ONS).
The UK isn’t the first country to try this. Sweden began to survey populations this way shortly after 2000, and Finland even earlier: its 1990 census didn’t require direct data collection from the population. Other countries are following suit: the US will try the approach alongside traditional methods in its 2020 census. The UK government believes it will cost two-thirds as much as an online survey.
The UK’s ONS has been preparing for the changes since 2011. For them to be successful, it needs to secure access to data from multiple departments, link it by ensuring it all fits into a standard format and complete legal consultations to protect citizens’ privacy. The ONS is on course to achieve these by the mid-2020s. The key question remains whether or not these methods are accurate enough.
Tracking mobile phone traces
This is a challenge because the ONS is now experimenting with data from entirely new – and surprising – sources.
In the summer of 2017, the ONS trialled the use of mobile phone data to infer the movement of commuters across London. Commuting flows are included in the existing census survey – however, the results are often inconsistent and lacking in detail. The ONS needed a method that was as accurate as a survey, but quicker and less resource-intensive. Mobile phone data could allow statisticians to provide more up-to-date information and bring in topics not already covered in the census.
“We’re not interested in people, we’re interested in patterns”
“It was already known in 2011 that commuting data was something that was very hard to get from admin data,” said Susan Williams, who works on the Big Data team at ONS. “We could see that there were lots of parallels between what public transport bodies were doing. So there was a big push in 2014 when we started talking to mobile phone operators to see what state they were in, in terms of using this sort of information.”
The ONS used data from Vodafone to see how people moved from their area of residence to their main workplace area. The data is created when phones interact with the mobile phone network towers, marking an approximate location and a timestamp which can be used to assess movement from place to place. Citi Logik, an analytics firm contracted by ONS, then used anonymised records from Vodafone to produce the aggregated data.
“As we bring that data in we only have access to anonymised data. We’re not interested in people, we’re interested in patterns,” said Tinsley.
The results were promising. The trial showed that the mobile phone data correlated well with commuting flows over longer distances, but overestimated commutes over shorter distances – possibly due to it misclassifying journeys made by students or other non-workers. Improvements to the algorithm which works through the data could be able to strip this effect out.
Filling in the jigsaw
There’s huge potential value in using this kind of data due to its timeliness. Population changes minute by minute: census statistics only every 10 years. With so many key decisions based on population data, up-to-date statistics make better policy.
“In more developed countries you have data from more reliable censuses that are done more regularly, but you don’t have information on a population’s dynamics, where people are, where they live at night, where they go in the daytime,” said Tatem.
Collecting the data itself is only half the work. A great deal of effort must go into combining it with other sources, in order to answer real questions.
“One data set on its own has some useful stuff in it, but has drawbacks,” said Tatem. “Census data can capture numbers of people, but it’s often outdated. Mobile data can come in and bring things up-to-date, satellite data can capture really fine-detailed locations of buildings but not who’s living in them. Bringing data from different sources can fill in that gap and get more detail in space and in time.”
“When we do get data, it’s very biased data. It’s sim cards moving around – who is that sim card representing?”
When data from different sources is combined together in this way, it can transform government. Next year in Santiago, a collaboration between the city, policy labs and a mobile phone provider will strip female commuting data from the crowd to help planners understand their movements and make the city safer for women. It can also be used to show how populations react to disease and environmental catastrophes.
There are difficulties. Such data is notoriously hard to access and, when it can be, is often expensive. “Sixty to seventy percent of efforts will fail, and those that do succeed can take about a year of setting up the right legal agreements, getting the right sign-offs from agencies,” said Tatem.
The ONS has a policy of not paying for the data it uses. In the commuting flow pilot, the agency paid a fee to Citi Logik to anonymise and aggregate data but did not pay for the data itself. Although an administrative data census saves money over the long term, it requires investment to get there.
Policymakers also need to know the limitations of these approaches. “When we do get data, it’s very biased data. It’s sim cards moving around,” said Tatem. “There’s a lot of excitement with these new data sources, but you still need traditional data sources to tie them to reality, so you know what those sim cards actually represent. Is it a family moving around, an individual? Who is that sim card representing?”
“There are countries throughout the world which don’t have any information on how many people live in the country, let alone know where they are”
A recent report by the Urban Institute in the US questioned the effect of an administrative census on minority groups. Data from existing administrative interactions can always contain biases and miss groups who fall through the gaps. Statisticians need to take care to work out who exactly is covered by a data set, and how they can fill in blank spots.
Failing to do so means vulnerable groups within societies are ignored when policy is made. In some cases, the gaps can be huge. “There are countries throughout the world – like Afghanistan, which hasn’t had a census since 1979, and the Democratic Republic of Congo, which hasn’t since 1984 – that don’t have any information on how many people live in the country, let alone know where they are,” said Tatem.
In the long term, the benefits of an administrative data census are clear to see: quicker, regular statistics at a lower cost. It remains a crucial step for governments to take to draw an accurate and up-to-date picture of all the people they serve.
Picture Credit: Flickr/Aftab Uzzaman)