Scottish Historic Population Platform (SHiPP)

In this project we aim to enhance historical administrative data - the Scottish Historic Population Platform (SHiPP) and Scottish civil registration records from 1855 to 1973 - for research use.

Research focus

Scottish civil registration (vital events) records of births, marriages and deaths from 1855 to 1973, which include a wealth of socio-demographic information, are being transcribed as part of Digitising Scotland project. The ultimate goal is making them research ready as part of the Scottish Historic Population Platform (SHiPP). The scale of the task, however, requires automated methods for record linkage, standardisation & encoding of occupation and cause of death descriptions to international coding schemes.

  1. Developing a classification scheme for coding causes of death

The aim is to make the SHiPP research-ready by coding the transcribed text for causes of death into a classification scheme for the 8 million death records spanning 1855-1973. Together with colleagues on the Studying the history of Health in Port cities (SHiP) project we are helping them to develop a historical extension of WHO International Classification of Diseases 10th revision (ICD-10) called the ICD historical international (ICD10hi) version.

The ICD10hi is a historical extension classification scheme (of ICD-10) which will allow researchers to look at causes of death through time more easily than having to work with raw cause of death data.

  1. Auto-coding causes of death and occupations

The SHPD digitised birth, marriage, and death records include textual descriptions of causes of death and occupations. To use these effectively for large-scale research they must be coded in a form suitable for statistical analysis. We will map the descriptions to the internationally recognised classification for occupations to social stratification of standard HISCO codes (based on ISCO 1968) and for causes of death to ICD10hi (outlined above). For both areas, we are applying machine learning techniques for automatic coding.

Part of this workstream also involves creating gold-standard hand-coded test data (occupations hand-coded to HISCO and causes of death hand-coded to ICD10hi) to be used in in the machine learning. This part is led by our colleagues within the Centre for Data Digitisation and Analysis at Queens University Belfast.

The application of auto-coding to code occupations to HISCO and causes of death to ICD10hi will allow researchers to work more easily with SHPD through time than having to work with raw text strings.

  1. Using the Scottish Historic Population Platform (SHiPP)

We will test the quality of these new data by looking at adverse early-life conditions and later-life outcomes. This is particularly important for Scotland given the nature of health inequalities.

Exposure to adverse conditions early in life, even in utero, has detrimental effects on later-life outcomes. It increases morbidity & mortality and compromises socio-economic attainment.

We aim to use natural experiments, such as the 1918 influenza pandemic, to investigate impacts of adverse conditions on life expectancy, mortality by cause of death and achieved occupation and to answer the question whether a higher socio-economic status of parents can act as a mitigating factor.

 

Data sources

SHiPP has been created and digitised by the Digitising Scotland project containing the Scottish Statutory registers of births, deaths and marriages (1855-1973). For more information on these records visit the Scotland’s People website.

 

What this will enable researchers to do

Enhancing the Scottish Historic Population Platform (SHiPP) will allow researchers to undertake research across all years using information transcribed from the original certificates coded to standard classification schemes, along with some quality assurance of the data.

This will save researchers time in having to code text strings ahead of researching topics such as social position (socio-economic status) over time and changes in causes of death by time period, area and by social position.

 

Research team

Project Lead: Professor Chris Dibben and Bea Alex, Elaine Farrow, Zhiqiang Feng, Eilidh Garrett, Claire Grover, Beata NowokRichard Tobin and Lee Williamson.

Publications and outputs

When publications or outputs are available, they will be shared here. For more information about this project, please contact us.