RC33 Eighth International Conference on Social Science Methodology

Changes in process-generated data: The importance of documentation

Andreas Helmut Schneider

Date: 2012-07-10 11:00 AM – 12:30 PM
Data documentation is an important issue about process-generated data. As the researcher cannot plan or influence the way of data collection and transformation, he necessarily needs to know about the steps the data have gone through to get an idea of their characteristics. Especially data problems that may arise from the way of processing must be acknowledged to the researcher to ensure valid results.

A particular challenge for data documentation is changes in the process of data generation. Working with longitudinal data, changes which affect the structure and the content of datasets compared to former versions have to be documented carefully. The documentation should allow the researcher to recognize both cases and variables even if their structure has changed essentially.

The administrative data sources used by the Institute for Employment Research (IAB) are subject to frequent changes, especially of the data collection tools. A major change taken part in 2006, when the data collection software "coArb" was replaced by the new software "VerBIS", is presented as an example of the role of documentation. The severe effects of the change on the properties of the unemployment datasets and the resulting difficulties to analyze certain variables over time have caused many efforts in data documentation and still do.