Why misconceptions about population data can lead to bad outcomes
The misconceptions about population data can lead to researchers making mistakes when processing, linking, and analysing such data, potentially resulting in poor real-world decisions.
Data about people of whole populations are often seen as the new oil of the Big data age. They allow companies to recommend products tailored to you and help governments gain insight into their citizens’ needs. Such population data have also been highly valuable to better understand the spread and the effects of the Covid pandemic.
Population data are generally not collected with research in mind. Rather, their primary use is for administrative or commercial reasons, such as billing patients for their doctor’s visits. As a result, data about people might be incomplete (children will not be employed) and biased (homeless people are unlikely to pay electricity bills). There can also be multiple records for a single individual, which happens when somebody moves or changes their name when getting married. People also provide their details in different forms; Robert, for example, uses his full name when providing his details to the government, but otherwise uses Bob when shopping online. Over twenty issues originating from how data are captured have been identified.
But there are further misconceptions about how data are processed, many of them due to falsely assuming error-free data processing. Because much data processing involves humans, errors do happen for reasons such as time pressure, misunderstandings of requirements, or incorrect use of software, just to name a few. Often multiple population databases need to be linked to allow advanced data analysis to, for example, explore the effects of people’s education and employment on their health. Such data linkage often involves complex methods and processes that lead to subtle technical problems which are easily missed.
A guiding principle in science is that data needs to be collected and processed in rigorous ways, ensuring the quality of any data analysis. However, the way population data are collected and even how they are processed and linked is commonly outside the control of a researcher. Properly conducting science when using population data can therefore be challenging. Remarkably, many of the misconceptions identified are due to the social nature of data collection and are therefore missed by purely technical solutions of data processing.
A recent IJPDS article “Thirty-three myths and misconceptions about population data: from data capture and processing to linkage” has been published by Professor Rainer Schnell and our Professor Peter Christen who is part of the SCADR team working on the Scottish Historic Population Platform (SHiPP) programme and a world-leading expert with decades of experience in working with population data. Their article describes over thirty misconceptions about population data and provides recommendations to help researchers and practitioners recognise and overcome such misconceptions.
Professors Christen and Schnell conclude:
Because good data management is a key aspect of good science, it is vital for anybody who uses population data to be aware of underlying assumptions concerning this kind of data. Our aim is to help identify and prevent misleading conclusions and poor real-world decisions being made, and ensure that population data will become the new oil of the Big data era.
Blog content taken from an IJPDS news item.
This article was published on 09 Feb 2023