Impacts on fit and reliability of the ordering of response categories in polytomous items
Curt Hagquist, David Andrich
Building: Holme Building
Room: MacCallum Room
Date: 2018-12-13 03:30 PM – 05:00 PM
Last modified: 2018-10-17
Abstract
Background
Psychometric analyses of measurement instruments consisting of polytomous items tend to pay insufficient attention to the empirical functioning of the response categories.
The purpose is to study the impacts of the ordering of the response categories with respect to the fit of the data to the Rasch model and the reliability and precision of measurement.
Methods
Two data sets were analysed: Swedish data from the international Health Behaviour in School-aged Children (HBSC) study collected at seven times 1986-2014, including students in grades 5, 7 and 9 (11, 13, 15 years). Data from the Swedish regional study Young in Värmland (YiV), collected at eight points in time 1988–2011 among students in grade 9 (15–16 years).
Two different composite measures of psychosomatic problems were analysed using Rasch Measurement Theory, based on HBSC and YiV data. Both measures consisted of eight items, different in their phrasing in the two item sets but covering similar content. The number of response categories was five in both measures, but the construction of the categories was different. Both measures were analysed at an adjusted sample size of 1000.
Results
Three items in the HBSC measure showed disordered thresholds. Two pairs of categories were collapsed resulting in three response categories. In the YiV measure the categories in all eight items were properly ordered. Two pairs of categories were collapsed for tentative purposes resulting in three response categories.
Collapsing the categories in the HBSC measure with disordered thresholds improved the overall fit and caused only a small decrease of the person separation index (PSI), from 0.74385 to 0.72995. In contrast, collapsing the categories in the YiV measure with ordered item thresholds implied a worsening of the overall fit and a big decrease of the PSI, from 0.83035 to 0.68345.
Conclusions
The minor decrease of the PSI when two pairs of response categories were collapsed in the HBSC measure is notable given that the number of data points was halved, from 32 to 16. Usually, the number of data points have a great impact on the reliability, i.e. the more categories the higher the reliability. The negative impact on fit and reliability caused by the collapsing of categories in the YiV measure confirms that collapsing of properly ordered categories is likely to destroy the properties of a measure. The results highlights the importance of properly ordered response categories and the negative impacts of badly constructed response categories.
Psychometric analyses of measurement instruments consisting of polytomous items tend to pay insufficient attention to the empirical functioning of the response categories.
The purpose is to study the impacts of the ordering of the response categories with respect to the fit of the data to the Rasch model and the reliability and precision of measurement.
Methods
Two data sets were analysed: Swedish data from the international Health Behaviour in School-aged Children (HBSC) study collected at seven times 1986-2014, including students in grades 5, 7 and 9 (11, 13, 15 years). Data from the Swedish regional study Young in Värmland (YiV), collected at eight points in time 1988–2011 among students in grade 9 (15–16 years).
Two different composite measures of psychosomatic problems were analysed using Rasch Measurement Theory, based on HBSC and YiV data. Both measures consisted of eight items, different in their phrasing in the two item sets but covering similar content. The number of response categories was five in both measures, but the construction of the categories was different. Both measures were analysed at an adjusted sample size of 1000.
Results
Three items in the HBSC measure showed disordered thresholds. Two pairs of categories were collapsed resulting in three response categories. In the YiV measure the categories in all eight items were properly ordered. Two pairs of categories were collapsed for tentative purposes resulting in three response categories.
Collapsing the categories in the HBSC measure with disordered thresholds improved the overall fit and caused only a small decrease of the person separation index (PSI), from 0.74385 to 0.72995. In contrast, collapsing the categories in the YiV measure with ordered item thresholds implied a worsening of the overall fit and a big decrease of the PSI, from 0.83035 to 0.68345.
Conclusions
The minor decrease of the PSI when two pairs of response categories were collapsed in the HBSC measure is notable given that the number of data points was halved, from 32 to 16. Usually, the number of data points have a great impact on the reliability, i.e. the more categories the higher the reliability. The negative impact on fit and reliability caused by the collapsing of categories in the YiV measure confirms that collapsing of properly ordered categories is likely to destroy the properties of a measure. The results highlights the importance of properly ordered response categories and the negative impacts of badly constructed response categories.