Circumventing construct-irrelevant variance in international assessments using cognitive diagnostic modeling: A curriculum-sensitive measure

Hannah Heister, Rolf Strietholt, Philipp Doebler, Purya Baghaei*

*Corresponding author voor dit werk

OnderzoeksoutputAcademicpeer review

19 Downloads (Pure)

Samenvatting

International large-scale assessments such as TIMSS administer achievement tests that are based on an analysis of national curricula to compare student achievement across countries. The organizations that coordinate these studies use Rasch or more generalized item response theory (IRT) models in which all test items are assumed to measure a single latent ability. The test responses are then used to estimate this ability, and the ability scores are used to compare countries.

A central but yet-to-be-contested assumption of this approach is that the achievement tests measure an unobserved unidimensional continuous variable that is comparable across countries. One threat to this assumption is the fact that countries and even regions or school tracks within countries have different curricula. When seeking to fairly compare countries, it seems legitimate to account for the fact that applicable curricula differ and that some students may not have been taught the full test content yet. When seeking to fairly compare countries, it seems imperative to account for the fact that national curricula differ and that some countries may not have taught the full test content yet. Nevertheless, existing IRT-based rankings ignore such differences.

The present study proposes a direct method to deal with differing curricula and create a fair ranking of educational quality between countries. The new method compares countries solely on test content that has already been taught; it uses information on whether students have mastered skills taught in class or not and does not consider contents that have not been taught yet. Mastery is assessed via the deterministic-input, noisy, “and” gate (DINA) model, an interpretable and tractable cognitive diagnostic model. To illustrate the new method, we use data from TIMSS 1995 and compare it to the IRT-based scores published in the international study report. We find a mismatch between the TIMSS test contents and national curricula in all countries. At the same time, we observe a high correlation between the scores based on the new method and the conventional IRT scores. This finding underscores the robustness of the performance measures reported in TIMSS despite existing differences across national curricula.
Originele taal-2English
Artikelnummer101393
Aantal pagina's10
TijdschriftStudies in Educational Evaluation
Volume83
Vroegere onlinedatum19-aug.-2024
DOI's
StatusPublished - dec.-2024

Vingerafdruk

Duik in de onderzoeksthema's van 'Circumventing construct-irrelevant variance in international assessments using cognitive diagnostic modeling: A curriculum-sensitive measure'. Samen vormen ze een unieke vingerafdruk.

Citeer dit