ACSPRI Conferences, RC33 Eighth International Conference on Social Science Methodology

Font Size:  Small  Medium  Large

Investigations towards implementing the EM algorithm to estimate model parameters for data linking projects at the Australian Bureau of Statistics

Carrie Samuels

Building: Law Building
Room: Breakout 6 - Law Building, Room 022
Date: 2012-07-11 01:30 PM – 03:00 PM
Last modified: 2011-12-19


Data linking is the act of linking two or more data files to bring together records which belong to the same individual. Data linking is performed at the Australian Bureau of Statistics (ABS) under the banner of the Census Data Enhancement Project, and involves linking Census data to administrative data sets. This data linking is conducted under the probabilistic framework of the Fellegi–Sunter model, which requires unknown parameters to be estimated for each linkage project. Previously the ABS has used training data to estimate these parameters, but there are limitations and drawbacks to this method. The use of the EM algorithm to estimate the parameters of the Fellegi–Sunter model is well established in the literature. After some preliminary empirical investigations with synthetic data, the decision was made to pursue further investigations with real data with a view to using the EM algorithm as part of the data linking production environment at the ABS. In this presentation we will discuss the results of these empirical investigations, and outline the current state of play regarding the use of the EM algorithm to estimate these parameters for ABS data linking projects.