Application of 'external validity' with a calculated indicator
Andrey Aleksey Veykher
Building: Law Building
Room: Breakout 6 - Law Building, Room 022
Date: 2012-07-12 11:00 AM – 12:30 PM
Last modified: 2012-06-19
Abstract
Nonresponses caused by refusals became main reason for surveys' failure to provide representative data.
Refusals do not occur randomly. Surveys became a part of mass culture. Polls on specific topics (e.g. electoral, marketing) have become the elements of sub-cultures of different social groups. Therefore, failure to participate in polls distorts representation in survey data of certain social groups and subgroups. Consequently, estimations of indicators distributions in certain social groups are systematically biased.
In experimental studies we conducted in 2005-2010 years, if refusal non-response - was over 25%, any attempts to repair sample by weighting age and sex always led to underestimation of the average number of children for women in the category 30-45 years.
The explanation is obvious: women with young children seldom have time to participate in a survey interview. However, it is still possible to resolve the aforementioned problem using demographic data collected by governmental agencies by means of stratified sample, which takes into account a number of children in families.
In developed countries surveys with refusal non-response more than 25-35% are common case. So it is a widespread problem. If the target indicator of concern could not be corrected with the help of governmental statistical data, some other solution is needed.
The author suggests to use the approach relying on «external validity» in the form of "cross-validation method" as a statistical model.
Besides 'target questions' (related to the purpose of a survey) the author suggests to include into a questionnaire some other questions, which are not of substantial interest but could be used for calculations of indicators, comparable with statistical data from some independent source.
Validity will be estimated considering a degree of deviation of sample data from external (objective statistical) data, obtained from independent source.
For example: the quality of a sample in the survey on "shadow" wages among employed residents in St. Petersburg (Russia) was evaluated by means of the external indicator "Earnings before income taxes" from official statistical data with comparing indicator. The comparing indicator was calculated according to three parameters from survey data: amount of the total wages of respondent, proportion of those who receive wages without registration, amount of wages of respondent without registration.
Refusals do not occur randomly. Surveys became a part of mass culture. Polls on specific topics (e.g. electoral, marketing) have become the elements of sub-cultures of different social groups. Therefore, failure to participate in polls distorts representation in survey data of certain social groups and subgroups. Consequently, estimations of indicators distributions in certain social groups are systematically biased.
In experimental studies we conducted in 2005-2010 years, if refusal non-response - was over 25%, any attempts to repair sample by weighting age and sex always led to underestimation of the average number of children for women in the category 30-45 years.
The explanation is obvious: women with young children seldom have time to participate in a survey interview. However, it is still possible to resolve the aforementioned problem using demographic data collected by governmental agencies by means of stratified sample, which takes into account a number of children in families.
In developed countries surveys with refusal non-response more than 25-35% are common case. So it is a widespread problem. If the target indicator of concern could not be corrected with the help of governmental statistical data, some other solution is needed.
The author suggests to use the approach relying on «external validity» in the form of "cross-validation method" as a statistical model.
Besides 'target questions' (related to the purpose of a survey) the author suggests to include into a questionnaire some other questions, which are not of substantial interest but could be used for calculations of indicators, comparable with statistical data from some independent source.
Validity will be estimated considering a degree of deviation of sample data from external (objective statistical) data, obtained from independent source.
For example: the quality of a sample in the survey on "shadow" wages among employed residents in St. Petersburg (Russia) was evaluated by means of the external indicator "Earnings before income taxes" from official statistical data with comparing indicator. The comparing indicator was calculated according to three parameters from survey data: amount of the total wages of respondent, proportion of those who receive wages without registration, amount of wages of respondent without registration.
Full Text: Full paper DOC