EVENT - Quality aspects of administrative data pilot course

Date & time

Tuesday, July 23, 2024 - 09:00 to 17:00

Location

Venue: Edinburgh Climate Change Institute (ECCI), High School Yards, Edinburgh, EH1 1LZ.

Duration: One day, 9.00-17.00, including coffee and lunch breaks.

Mode: In-person delivery with interactive practical group working sessions.

Capacity: The number of participants on this course is capped, with 20 places available.

Cost: Free (during piloting stage).

Please note that the ‘Quality aspects of administrative data’ course is initially being piloted in Edinburgh, with further courses in the piloting stage taking place in London, Wales and Southampton in August (tbc).

The first course in Edinburgh will initially be offered to members of the SCADR network, before being offered to the wider ADR Scotland network (subject to availability).

To register for the course, please contact: scadr@ed.ac.uk

Summary

This one-day course will provide an introduction to data quality, and how it can affect all aspects of working with administrative data.

As researchers and practitioners working with administrative data, we are often given datasets where we do not know the full provenance about how this data set was captured, what kind of processing has been applied to it, and if it has been linked or merged with data from other sources. Complete and up-to-date metadata are not always available.

Not fully understanding the provenance of a data set can lead to assumptions and misconceptions being made about the content and quality of a dataset. This can result in incorrect processing and / or analysis of a dataset which potentially can lead to bad outcomes and decision making.

Course audience

This course is aimed both at researchers and practitioners who are working with administrative data, as well as those who are involved in the management of data-centric systems in organisations that act as data custodians, or who are involved in the capture, processing, and linkage of data that potentially will be used for administrative data research. The course requires little technical knowledge and all technical background will be introduced during the course.

Course content

The course will cover data quality dimensions which include technical, social, as well as legal aspects; discuss frameworks that aim to quantify data quality; and provide examples and case studies showing how (the lack of) quality data can led to bad outcomes of data science projects.

The course will not focus on technical aspects of data cleaning, data processing, or data linkage, but rather highlight the issues researchers and practitioners need to be aware of when working with administrative data. It will provide and discuss a set of recommendations, and through interactive sessions participants will be able to share their own experiences of how data quality aspects have led to unexpected outcomes in projects they have worked in.

Course structure and topics covered

The course will be a mixture of four hours of interactive lectures (containing small practical exercises) plus two one-hour sessions with group discussions.

Lecture topics will cover (subject to change):

The data science workflow
Introduction to data wrangling and data analytics/mining
Overview of data quality aspects
Examples and case studies of what can go wrong in data science.
Data quality dimensions, data quality assessments, data quality frameworks
How data capturing, data processing, and data linkage can affect data quality
Assumptions and misconceptions in data science and how to identify and prevent them
Recommendations on how to improve data quality aspects
What data scientists should check with their data sources
How to ensure data are research ready

Group discussions will cover:

Experiences of data quality issues encountered
How to implement recommendations within your own organisation

Presenter Biography

Peter Christen is the Research Lead on the Scottish Historic Population Platform (SHiPP) project, run at the Scottish Centre for Administrative Data Research (SCADR) at the University of Edinburgh. He is also a Professor at the School of Computing at the Australian National University in Canberra. Peter is a world-leading expert in record linkage with over 20 years experience in working with administrative data. He has over 200 publications in the area of data science, including the two books "Data Matching" in 2012 and "Linking Sensitive Data" (co-authored with Thilina Ranbaduge and Rainer Schnell) in 2020. As of May 2024, his work has attracted over 16,700 citations at Google Scholar.

This article was published on 20 Mar 2024

Categories and tags

Training

Menu

EVENT - Quality aspects of administrative data pilot course

Date & time

Tuesday, July 23, 2024 - 09:00 to 17:00

Location

Summary

Course audience

Course content

Course structure and topics covered

Presenter Biography

Categories and tags

Site highlights

BLOG - Getting to grips with administrative data analysis: My experience of the IADRA course

BLOG - Reflecting on the ADR UK Conference: Insights from our new PhD Researchers

NEWS - Scout and Guides participation boosts later life health

Menu

EVENT - Quality aspects of administrative data *pilot course*

Date & time

Tuesday, July 23, 2024 - 09:00 to 17:00

Location

Summary

Course audience

Course content

Course structure and topics covered

Presenter Biography

Categories and tags

Site highlights

BLOG - Getting to grips with administrative data analysis: My experience of the IADRA course

BLOG - Reflecting on the ADR UK Conference: Insights from our new PhD Researchers

NEWS - Scout and Guides participation boosts later life health

EVENT - Quality aspects of administrative data pilot course