Assessing Panel Survey Representativeness using Gold-Standard Data
Narayan Sastry, Denise Duffy
Building: Law Building
Room: Breakout 7 - Law Building, Room 028
Date: 2012-07-12 11:00 AM – 12:30 PM
Last modified: 2012-04-03
Abstract
Baseline nonresponse can fundamentally challenge the generalization of results and conclusions obtained from survey data. Even with high baseline and wave-to-wave response rates, panel studies face the problem of cumulative attrition. Furthermore, many on-going nationally-representative studies are challenged by flows of new immigrants—many segments of which are not automatically captured in the sampling frame. Assessing the overall representativeness of an on-going panel survey is thus challenging but has a high pay-off for data users and consumers of the study results.
In this paper, we exploit data from an independent gold-standard cross-sectional survey in the United States—the American Community Survey (ACS)—together with a new statistical approach known as generalized boosted models to assess the national representativeness of the 2007 sample of children included in the Panel Study of Income Dynamics (PSID). PSID is a nationally-representative panel of U.S. families that was begun in 1968 and had, by 2007, collected data on the same families and their descendents for 35 waves over 39 years. PSID in 2007 comprised of approximately 8,500 family units with a total of 24,000 individual family members—including 7,100 children aged 0–17 years of age.
ACS represents a gold-standard based on its extremely high (98%) response rate, excellent data quality and completeness, and large sample sizes (approximately 700,000 children 0–17 years in 2007). We have constructed a reasonably consistent set of covariates across the PSID and ACS to describe children based on their age, race, sex, poverty status, geographic region, and parents’ place of birth. The generalized boosted models that we use were developed initially for propensity models but are well-suited to our analysis. They provide flexible, non-parametric estimates for assessing the relationship between our dependent variable (an indicator of whether an observation in the pooled ACS-PSID sample came from the ACS) on a large number of covariates and their interactions using an adaptive functional form. Model results also provide useful ways of assessing balance—in this case, balance between the attributes of the PSID sample members and those from the ACS—based on each covariate included in the model. The resulting propensity scores can be used to assess overlap between the samples and to construct weights. Our results reveal that PSID child sample provides good representation of the corresponding population with coverage of over 95 percent of the population and reasonable balance for most groups—but with some notable exceptions.
In this paper, we exploit data from an independent gold-standard cross-sectional survey in the United States—the American Community Survey (ACS)—together with a new statistical approach known as generalized boosted models to assess the national representativeness of the 2007 sample of children included in the Panel Study of Income Dynamics (PSID). PSID is a nationally-representative panel of U.S. families that was begun in 1968 and had, by 2007, collected data on the same families and their descendents for 35 waves over 39 years. PSID in 2007 comprised of approximately 8,500 family units with a total of 24,000 individual family members—including 7,100 children aged 0–17 years of age.
ACS represents a gold-standard based on its extremely high (98%) response rate, excellent data quality and completeness, and large sample sizes (approximately 700,000 children 0–17 years in 2007). We have constructed a reasonably consistent set of covariates across the PSID and ACS to describe children based on their age, race, sex, poverty status, geographic region, and parents’ place of birth. The generalized boosted models that we use were developed initially for propensity models but are well-suited to our analysis. They provide flexible, non-parametric estimates for assessing the relationship between our dependent variable (an indicator of whether an observation in the pooled ACS-PSID sample came from the ACS) on a large number of covariates and their interactions using an adaptive functional form. Model results also provide useful ways of assessing balance—in this case, balance between the attributes of the PSID sample members and those from the ACS—based on each covariate included in the model. The resulting propensity scores can be used to assess overlap between the samples and to construct weights. Our results reveal that PSID child sample provides good representation of the corresponding population with coverage of over 95 percent of the population and reasonable balance for most groups—but with some notable exceptions.