Use of DDI structured metadata in the production of public-use data files
Peter Granda
Building: Law Building
Room: Breakout 6 - Law Building, Room 022
Date: 2012-07-12 01:30 PM – 03:00 PM
Last modified: 2012-06-20
Abstract
Recent changes in the purpose and structure of the Data Documentation Initiative (DDI) now enable the creation of metadata which can fully describe the history of a survey from study concept through data collection, processing, distribution, archiving, and repurposing of existing data.
The presentation will focus on the advantages of using DDI in the transition between the end of the survey data collection process and the creation of public-use files that are made available to the social science research community to conduct secondary analyses. When cross-national surveys are conducted they are almost invariably merged into a single integrated, public-use data file that contains respondents from all countries. This process is often not well documented in two ways: (1) What procedures did data producers use to create the integrated file; and (2) How did they decide which variables to include in public-use files, which to drop, which to recode for confidentiality concerns, and how best to inform users of all of these decisions? These same issues also arise when surveys in a one country are harmonized into a single, public-use file or when survey questionnaires change over time to capture new concepts and measures.
A case study, which describes a single country survey but is also applicable cross-nationally, will illustrate these issues and describe a hybrid system that uses DDI as an interchange mechanism to document variable-level information that uses a BLAISE-programmed instrument for data collection. Since this survey is done on the basis of continuous interviewing, there is a need to update metadata continually as new variables, code values, or constructed variables are added or modified during the course of data collection.
Since all of the metadata is stored in a repository in a software-independent format and manipulated in DDI, this example illustrates how data producers can provide greater transparency to users when describing the creation of public-use files. Such transparency improves quality from both the data processing and data analysis perspectives, facilitates good preservation practices, and encourages further reuse and repurposing of this material at a later time by new generations of survey researchers.
The presentation will focus on the advantages of using DDI in the transition between the end of the survey data collection process and the creation of public-use files that are made available to the social science research community to conduct secondary analyses. When cross-national surveys are conducted they are almost invariably merged into a single integrated, public-use data file that contains respondents from all countries. This process is often not well documented in two ways: (1) What procedures did data producers use to create the integrated file; and (2) How did they decide which variables to include in public-use files, which to drop, which to recode for confidentiality concerns, and how best to inform users of all of these decisions? These same issues also arise when surveys in a one country are harmonized into a single, public-use file or when survey questionnaires change over time to capture new concepts and measures.
A case study, which describes a single country survey but is also applicable cross-nationally, will illustrate these issues and describe a hybrid system that uses DDI as an interchange mechanism to document variable-level information that uses a BLAISE-programmed instrument for data collection. Since this survey is done on the basis of continuous interviewing, there is a need to update metadata continually as new variables, code values, or constructed variables are added or modified during the course of data collection.
Since all of the metadata is stored in a repository in a software-independent format and manipulated in DDI, this example illustrates how data producers can provide greater transparency to users when describing the creation of public-use files. Such transparency improves quality from both the data processing and data analysis perspectives, facilitates good preservation practices, and encourages further reuse and repurposing of this material at a later time by new generations of survey researchers.