Linking national population-based surveys to the United States’ National Death Index
Donna Miller
Building: Law Building
Room: Breakout 6 - Law Building, Room 022
Date: 2012-07-11 03:30 PM – 05:00 PM
Last modified: 2011-12-19
Abstract
The National Center for Health Statistics (NCHS) has developed a record linkage program designed to maximize the scientific value of the Center's population-based surveys. NCHS is currently linking various NCHS population-based surveys with death certificate records from the National Death Index (NDI).
Linkage of the NCHS survey participants with the NDI provides the opportunity to conduct a vast array of outcome studies designed to investigate the association of a wide variety of health factors with mortality.
Linkage of the NCHS’s population-based surveys with the NDI is a complicated process. Many aspects of the linkage process need to be addressed to produce high quality files. For each linkage, the overall quality of the survey data to be matched is assessed. Name fields are reviewed for invalid names. Dates of birth and Social Security Numbers (SSN, unique identifier) are programmatically checked for invalid values. Next, evaluation studies are conducted to calibrate the probabilistic scoring algorithm. For example, a calibration sample containing the true vital status of all sample subjects is identified, matched to the NDI and the results of the NDI match are compared to the true known vital status of the sample subjects.
Once the initial matches are made, there are additional steps needed. NCHS establishes criteria for determining which records are clerically reviewed. For example, because the NCHS collects decedent information from multiple sources, records with conflicting dates of death are often clerically reviewed. NCHS also conducts a variety of methodological studies to improve the initial match results and assist in the clerical review. In the United States, a person’s 9-digit SSN is heavily relied on for matching records. NCHS has conducted several studies to assess the implications of not having the full 9-digit SSN as well as looking at the implications of only matching with six or four digits of SSN.
Finally, once NCHS has completed the linking of records to the NDI, additional work is undertaken before final linked mortality files are made available. New eligibility adjustment weights are created and take into account survey subjects ineligible for linkage to the NDI due to insufficient identifying data. Additionally, NCHS develops a data perturbation plan to release public-use linked mortality files. Data collected by NCHS are guaranteed by law to be held in the strictest of confidence and therefore any data files NCHS releases are evaluated for disclosure risks and are subjected to data perturbation techniques.
Linkage of the NCHS survey participants with the NDI provides the opportunity to conduct a vast array of outcome studies designed to investigate the association of a wide variety of health factors with mortality.
Linkage of the NCHS’s population-based surveys with the NDI is a complicated process. Many aspects of the linkage process need to be addressed to produce high quality files. For each linkage, the overall quality of the survey data to be matched is assessed. Name fields are reviewed for invalid names. Dates of birth and Social Security Numbers (SSN, unique identifier) are programmatically checked for invalid values. Next, evaluation studies are conducted to calibrate the probabilistic scoring algorithm. For example, a calibration sample containing the true vital status of all sample subjects is identified, matched to the NDI and the results of the NDI match are compared to the true known vital status of the sample subjects.
Once the initial matches are made, there are additional steps needed. NCHS establishes criteria for determining which records are clerically reviewed. For example, because the NCHS collects decedent information from multiple sources, records with conflicting dates of death are often clerically reviewed. NCHS also conducts a variety of methodological studies to improve the initial match results and assist in the clerical review. In the United States, a person’s 9-digit SSN is heavily relied on for matching records. NCHS has conducted several studies to assess the implications of not having the full 9-digit SSN as well as looking at the implications of only matching with six or four digits of SSN.
Finally, once NCHS has completed the linking of records to the NDI, additional work is undertaken before final linked mortality files are made available. New eligibility adjustment weights are created and take into account survey subjects ineligible for linkage to the NDI due to insufficient identifying data. Additionally, NCHS develops a data perturbation plan to release public-use linked mortality files. Data collected by NCHS are guaranteed by law to be held in the strictest of confidence and therefore any data files NCHS releases are evaluated for disclosure risks and are subjected to data perturbation techniques.