A Comparison of the efficacies of differential item functioning detection methods


Creative Commons License

BAŞMAN M.

INTERNATIONAL JOURNAL OF ASSESSMENT TOOLS IN EDUCATION, cilt.10, sa.1, ss.145-159, 2023 (ESCI) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 10 Sayı: 1
  • Basım Tarihi: 2023
  • Doi Numarası: 10.21449/ijate.1135368
  • Dergi Adı: INTERNATIONAL JOURNAL OF ASSESSMENT TOOLS IN EDUCATION
  • Derginin Tarandığı İndeksler: Emerging Sources Citation Index (ESCI), ERIC (Education Resources Information Center), TR DİZİN (ULAKBİM)
  • Sayfa Sayıları: ss.145-159
  • Anahtar Kelimeler: Crossing simultaneous, item bias test, Differential item, functioning, Logistic Regression, Lord's chi-square, Mantel-Haenszel, Raju's area measure, MANTEL-HAENSZEL PROCEDURE, LOGISTIC-REGRESSION, I ERROR, SAMPLE-SIZES, DIF, SIBTEST, POWER, IDENTIFICATION, MODEL
  • Marmara Üniversitesi Adresli: Evet

Özet

To ensure the validity of the tests is to check that all items have similar results across different groups of individuals. However, differential item functioning (DIF) occurs when the results of individuals with equal ability levels from different groups differ from each other on the same test item. Based on Item Response Theory and Classic Test Theory, there are some methods, with different advantages and limitations to identify items that show DIF. This study aims to compare the performances of five methods for detecting DIF. The efficacies of Mantel-Haenszel (MH), Logistic Regression (LR), Crossing simultaneous item bias test (CSIBTEST), Lord's chi-square (LORD), and Raju's area measure (RAJU) methods are examined considering conditions of the sample size, DIF ratio, and test length. In this study, to compare the detection methods, power and Type I error rates are evaluated using a simulation study with 100 replications conducted for each condition. Results show that LR and MH have the lowest Type I error and the highest power rate in detecting uniform DIF. In addition, CSIBTEST has a similar power rate to MH and LR. Under DIF conditions, sample size, DIF ratio, test length and their interactions affect Type I error and power rates.