Classifying and autocoding causes of death
In this project we aim to enhance historical administrative data by classifying and autocoding causes of death to aid research use.
Research focus
Making socio-demographic records, research ready, by developing automated methods for record linkage, standardisation & encoding of occupation and cause of death descriptions to international coding schemes.
- Developing a classification scheme for coding causes of death
The aim is to make the SHiPP research-ready by coding the transcribed text for causes of death into a classification scheme for the sample sized 8 million death records spanning 1855-1973. Together with colleagues on the Studying the history of Health in Port cities (SHiP) project we are helping them to develop a historical extension of WHO International Classification of Diseases 10th revision (ICD-10) called the ICD historical international (ICD10hi) version.
The ICD10hi is a historical extension classification scheme (of ICD-10) which will allow researchers to look at causes of death through time more easily than having to work with raw cause of death data.
- Auto-coding causes of death and occupations
The Scottish Historic Population Database (SHPD) has digitised birth, marriage, and death records which include textual descriptions of causes of death and occupations. To use these effectively for large-scale research they must be coded in a form suitable for statistical analysis. We will map the descriptions to the internationally recognised classification for occupations to social stratification of standard HISCO codes (based on ISCO 1968) and for causes of death to ICD10hi (outlined above). For both areas, we are applying machine learning techniques for automatic coding.
Part of this workstream also involves creating gold-standard hand-coded test data (occupations hand-coded to HISCO and causes of death hand-coded to ICD10hi) to be used in the machine learning. This part is led by our colleagues at the Centre for Data Digitisation and Analysis, Queens University Belfast.
The application of auto-coding to code occupations to HISCO and causes of death to ICD10hi will allow researchers to work more easily with SHPD through time rather than having to work with raw text strings.
Data sources
The Scottish Historic Population Platform (SHiPP) has been created and digitised by the Digitising Scotland project containing the Scottish Statutory registers of births, deaths and marriages (1855-1973). For more information on these records visit Scotland’s People website.
Civil registration of birth, deaths and marriage, which was introduced in Scotland in 1855. For Scotland we derived numbers of births and early age deaths from the database created by the Digitising Scotland project. The Digitising Scotland (DS) data and research access provided by staff of the Longitudinal Studies Centre - Scotland (LSCS) is acknowledged. The LSCS is supported by the ESRC and DS from the ESRC grant ES/K00574X/2. Summary mortality data for Scotland was taken from the Human Mortality Database.
Boundary data for Scotland had been created for the project by Max Satchell from set of Continuous Registration Districts devised by Eilidh Garrett. The dataset was created from 'Britain's First Demographic Transition' with funding from the ESRC, and was further enhanced by Scotland's Parish Populations: parish boundaries, 1755–1891 (2019), and deposited at National Records of Scotland (an enhanced version of I-CeM Scotland 1891).
What this will enable researchers to do
Coding the original certificates to standard classification schemes will not only provide some quality assurance of the data, but will save researchers time in having to code text strings ahead of researching topics such as social position (socio-economic status) over time and changes in causes of death by time period, area and by social position.
Research team
Project Lead: Professor Chris Dibben and Professor Peter Christen, Bea Alex, Zhiqiang Feng, Eilidh Garrett, Charini Nanayakkara and Lee Williamson.
Publications, Outputs and Media Coverage
When publications or outputs are available, they will be shared here. For more information about this project, please contact us.