Abstract
Single task writing assessments used in longitudinal studies have raised concerns regarding their reliability. By means of Generalizability Theory (GT), this study investigated the reliability of L2 writing assessments scored on different CAF measures, focusing on a) the reliability of single task writing assessments and on the effects of b) task topics and c) task-taking occasions on assessment reliability. We investigated analytic quantitative scores obtained from five CAF measures through a 1-day dataset and a 21-day dataset, consisting of 90 essays from 18 Chinese learners of English who did not follow any formal language instruction during the investigation. The results show that although some CAF scores (e.g., fluency) of single task assessments have distinctly higher reliability than other scores, the general conclusion is that single task assessments are not reliable from a GT perspective. Task topic introduces some score variance to the assessment result, yet this amount of variance differs profoundly between the CAF measures due to the functional variability, which corresponds with Complex Dynamic Systems Theory assumptions suggesting subsystems of an L2 do not develop synchronously. Finally, occasion, i.e., whether two samples were written on the same day or within 21 days, barely introduces score variance.
Original language | English |
---|---|
Article number | 100950 |
Number of pages | 13 |
Journal | Journal of Second Language Writing |
Volume | 59 |
Early online date | 5-Dec-2022 |
DOIs | |
Publication status | Published - Mar-2023 |