ACSPRI Conferences, RC33 Eighth International Conference on Social Science Methodology

Font Size:  Small  Medium  Large

A constructive approach to data linkage

Jamas Enright, Dariusz Bielawski

Building: Law Building
Room: Breakout 6 - Law Building, Room 022
Date: 2012-07-11 01:30 PM – 03:00 PM
Last modified: 2012-03-06


In contrast to the many theoretical papers on the idea of data linkage, we present several practical steps developed during data linkage projects at Statistics New Zealand.
Names and other fields can be cleaned to enable linking, however typically this is done for one row at a time. In our approach, we show how we “roll up” several rows containing names for one person and then demonstrate how this is used in data linkage. (Statistics New Zealand uses the package QualityStage. While the exact comparisons will be specified as used in that package, the general principles will be clear.)
For each pass, selecting a cut-off involves either clerical review, or applying a theoretical model that typically assumes a normal distribution of correct and incorrect links. In this presentation we will demonstrate a more practical approach to determining the cut-off, particularly with a focus on minimising the errors of incorrectly linking records (called false positives).
From this, a general approach will be presented for linking data that consists of first and last names, sex and date of birth.
During this, we will also comment on the benefits and limitations of processing the data in this way.