EVENT - Quality aspects of administrative data *pilot course*

Date & time

Tuesday, July 23, 2024 - 09:00 to 17:00

Location

Venue: Edinburgh Climate Change Institute (ECCI), High School Yards, Edinburgh, EH1 1LZ.

Duration: One day, 9.00-17.00, including coffee and lunch breaks.

Mode: In-person delivery with interactive practical group working sessions. 

Capacity: The number of participants on this course is capped, with 20 places available.

Cost: Free (during piloting stage).

Please note that the ‘Quality aspects of administrative data’ course is initially being piloted in Edinburgh, with further courses in the piloting stage taking place in London, Wales and Southampton in August (tbc).

The first course in Edinburgh will initially be offered to members of the SCADR network, before being offered to the wider ADR Scotland network (subject to availability).

To register for the course, please contact: scadr@ed.ac.uk

Summary

This one-day course will provide an introduction to data quality, and how it can affect all aspects of working with administrative data.

As researchers and practitioners working with administrative data, we are often given datasets where we do not know the full provenance about how this data set was captured, what kind of processing has been applied to it, and if it has been linked or merged with data from other sources. Complete and up-to-date metadata are not always available.

Not fully understanding the provenance of a data set can lead to assumptions and misconceptions being made about the content and quality of a dataset. This can result in incorrect processing and / or analysis of a dataset which potentially can lead to bad outcomes and decision making.

Course audience

This course is aimed both at researchers and practitioners who are working with administrative data, as well as those who are involved in the management of data-centric systems in organisations that act as data custodians, or who are involved in the capture, processing, and linkage of data that potentially will be used for administrative data research. The course requires little technical knowledge and all technical background will be introduced during the course.

Course content

The course will cover data quality dimensions which include technical, social, as well as legal aspects; discuss frameworks that aim to quantify data quality; and provide examples and case studies showing how (the lack of) quality data can led to bad outcomes of data science projects.

The course will not focus on technical aspects of data cleaning, data processing, or data linkage, but rather highlight the issues researchers and practitioners need to be aware of when working with administrative data. It will provide and discuss a set of recommendations, and through interactive sessions participants will be able to share their own experiences of how data quality aspects have led to unexpected outcomes in projects they have worked in.

Course structure and topics covered

The course will be a mixture of four hours of interactive lectures (containing small practical exercises) plus two one-hour sessions with group discussions.

Lecture topics will cover (subject to change):

  • The data science workflow
  • Introduction to data wrangling and data analytics/mining
  • Overview of data quality aspects
  • Examples and case studies of what can go wrong in data science.
  • Data quality dimensions, data quality assessments, data quality frameworks
  • How data capturing, data processing, and data linkage can affect data quality
  • Assumptions and misconceptions in data science and how to identify and prevent them
  • Recommendations on how to improve data quality aspects
  • What data scientists should check with their data sources
  • How to ensure data are research ready

Group discussions will cover:

  • Experiences of data quality issues encountered
  • How to implement recommendations within your own organisation

 

Presenter Biography

Peter Christen is the Research Lead on the Scottish Historic Population Platform (SHiPP) project, run at the Scottish Centre for Administrative Data Research (SCADR) at the University of Edinburgh. He is also a Professor at the School of Computing at the Australian National University in Canberra. Peter is a world-leading expert in record linkage with over 20 years experience in working with administrative data. He has over 200 publications in the area of data science, including the two books "Data Matching" in 2012 and "Linking Sensitive Data" (co-authored with Thilina Ranbaduge and Rainer Schnell) in 2020. As of May 2024, his work has attracted over 16,700 citations at Google Scholar.

This article was published on 20 May 2024

Categories and tags