Abstract
Tracing L2 development via single samples collected longitudinally, which are often rated on quantitative complexity, accuracy, and fluency (CAF) measures, is a classic approach in Complex Dynamic System Theory (CDST) research. The reliability of those single samples, however, has been questioned by L2 assessment research using Generalizability Theory (GT) (e.g., Schoonen, 2005). Wu et al. (2022) therefore used GT to test the reliability of carefully restricted single task assessments rated on five CAF measures, which found that the reliability of the CAF scores differed substantially, and there seemed to be a difference between global and specific CAF measures.
This inspired the current experiment to assess the reliability of CAF measures commonly used in assessing L2 (English) speaking, and to investigate what characteristics of the measures would affect their reliability. To this end, we searched for L2 studies researching English oral production published between 2016 and 2021 on Web of Science, from which we selected 57 quantitative CAF measures used by more than two articles without overlapping authors. The study used GT to test the reliability of the 57 measures on 275 recordings collected from 55 Chinese learners of English, who performed five oral tasks with different topics back to back individually. In addition, the role of two characteristics of the CAF measures was studied by independent t-tests, which were the specificity of the measures (i.e., global or specific) and their rating procedures (i.e., automatic or manual).
Results from the GT analysis show the diverse reliability of quantitative CAF measures (Wu et al., 2022); the independent t-tests confirmed that global measures are more reliable than specific ones, but did not find a difference between automatical and manually-rated measures. These can inform CDST studies relying on single samples collected longitudinally which CAF measures have high reliability, i.e., are stable at a moment in time, and can therefore be used to distinguish L2 development from other kinds of variability. When tracing the development of certain low-reliability CAF measures (e.g., mean number of modifiers per noun phrase), on the other hand, it would be necessary to collect multiple samples at each datapoint, and further compare the variability within and in between data points.
This inspired the current experiment to assess the reliability of CAF measures commonly used in assessing L2 (English) speaking, and to investigate what characteristics of the measures would affect their reliability. To this end, we searched for L2 studies researching English oral production published between 2016 and 2021 on Web of Science, from which we selected 57 quantitative CAF measures used by more than two articles without overlapping authors. The study used GT to test the reliability of the 57 measures on 275 recordings collected from 55 Chinese learners of English, who performed five oral tasks with different topics back to back individually. In addition, the role of two characteristics of the CAF measures was studied by independent t-tests, which were the specificity of the measures (i.e., global or specific) and their rating procedures (i.e., automatic or manual).
Results from the GT analysis show the diverse reliability of quantitative CAF measures (Wu et al., 2022); the independent t-tests confirmed that global measures are more reliable than specific ones, but did not find a difference between automatical and manually-rated measures. These can inform CDST studies relying on single samples collected longitudinally which CAF measures have high reliability, i.e., are stable at a moment in time, and can therefore be used to distinguish L2 development from other kinds of variability. When tracing the development of certain low-reliability CAF measures (e.g., mean number of modifiers per noun phrase), on the other hand, it would be necessary to collect multiple samples at each datapoint, and further compare the variability within and in between data points.
Original language | English |
---|---|
Publication status | Published - Jul-2023 |
Event | The 20th World Congress of International Association of Applied Linguistics (AILA) - Lyon, France Duration: 17-Jul-2023 → 20-Jul-2023 |
Conference
Conference | The 20th World Congress of International Association of Applied Linguistics (AILA) |
---|---|
Country/Territory | France |
City | Lyon |
Period | 17/07/2023 → 20/07/2023 |