Data coding and harmonization: How DataCoH and Charmstats are transforming social science data

Kristi Winters

Building: Law Building
Room: Breakout 3 - Law Building, Room 104
Date: 2012-07-10 11:00 AM – 12:30 PM
Last modified: 2012-04-03


Comparative social researchers are often confronted with the challenge of making key theoretical concepts comparable across nations and/or time. One example is the socio-demographic variable ‘Education’. To operationalize ‘education’ researchers must review multiple educational systems across nations and/or changing educational structures within one nation across time. Further, researchers have multiple ways to recode education into a harmonized variable including (inter alia): the Hoffmeyer-Zlotnik/Warner matrix; the CASMIN education scheme; the International Standard Classification of Education; or a harmonized variable provided by the dataset itself.

GESIS is developing two electronic resources to assist social researchers. The website DataCoH (Data Coding and Harmonization) will provide a centralized online library of data coding and harmonization for existing variables to increase transparency and variable replication. DataCoH initially will contain socio-demographic variables used across the social sciences and then expand to discipline-specific variables. The software program Charmstats (Coding and Harmonizing Statistics) will provide a structured approach to data harmonization by allowing researchers to: 1) download harmonization protocols; 2) document variable coding and harmonization processes; 3) access variables from existing datasets for harmonization; and 4) create harmonization protocols for publication and citation. This paper explains DataCoH and Charmstats and demonstrates how they work.