ACSPRI Conferences, ACSPRI Social Science Methodology Conference 2010

Font Size:  Small  Medium  Large

Resolving Differential Item Functioning by split of items

Curt Hagquist

Building: Holme Building
Room: Sutherland Room
Date: 2010-12-03 01:30 PM – 03:00 PM
Last modified: 2010-11-17

Abstract


In order to enable invariant comparisons a measurement scale needs to work in the same way across different sample groups that are to be compared. In contrast, when a scale suffers of Differential Item Functioning (DIF) comparisons between groups may become invalid. While the potential problems caused by DIF-items are well recognised, methods to detect, quantify and resolve DIF still need to be elaborated. In particular, more attention should be paid to methods with capacity to distinguish real DIF from artificial DIF where the latter is an artefact of the procedure for identifying DIF. The purpose of the present paper is to demonstrate how DIF can be resolved by using the Rasch model. An iterative and stepwise procedure that enables distinguishing between real and artificial DIF is demonstrated.

 

DIF is a common problem in measurement scales intended to tap information about perceived mental health. In the present paper cross-sectional and nationwide questionnaire data on mental health collected among adolescents in Sweden are used.  Two scales are analysed, a scale on emotional problems and a scale on psychosomatic problems, consisting of seven and eight polytomous items respectively. Possible DIF is examined across age, country of birth and gender.

 

The data are analysed using the Rasch model.  Analysis of variance (ANOVA) of standardised residuals is conducted to detect possible DIF, uniform as well as non uniform. A stepwise and iterative procedure is applied to resolve DIF, based on principles of equating implying that the problematic items are split one by one:

First, the original set of items is analysed for DIF using ANOVA.

Second, the item showing the greatest DIF is split into two items, e.g. one item for boys and one for girls in situations with gender DIF.

Third, the revised item set, now consisting of the remaining original items and two gender specific items, is re-analysed using ANOVA in order to examine if there still are items showing DIF requiring item-split.

 

The DIF-analyses of each scale show that some items favour one group and some items favour the other group, giving an impression that DIF is cancelling out. This is a mistaken impression caused by artificial DIF, which is confirmed by the stepwise procedure for resolving DIF. Having resolved the real DIF-items, some items indicating DIF at the initial analysis now are DIF-free. Also, the comparisons of the mean values of each group (e.g. boys vs. girls) using the initial scale with those of the revised scales resolved for DIF reject the impression that DIF is cancelling out by showing that DIF also have a significant effect on person measurement.

 

In conclusion, real DIF has to be distinguished from artificial DIF, which is an artefact of the procedure for identifying DIF. Items showing DIF should be resolved step by step, one item at time. If items are resolved based on simultaneously analysis DIF-free items may be considered as DIF-items.