OBJECTIVE To determine the effectiveness of single-point benchmarking and longitudinal benchmarking for inter-school educational evaluation.
METHODS We carried out a mixed, longitudinal, cross-sectional study using data from 24 annual measurement moments (4 tests x 6 year groups) over 4 years for 4 annual progress tests assessing the graduation-level knowledge of all students from 3 co-operating medical schools. Participants included undergraduate medical students (about 5000) from 3 medical schools. The main outcome measures involved between-school comparisons of progress test results based on different benchmarking methods.
RESULTS Variations in relative school performance across different tests and year groups indicate instability and low reliability of single-point benchmarking, which is subject to distortions as a result of school-test and year group-test interaction effects. Deviations of school means from the overall mean follow an irregular, noisy pattern obscuring systematic between-school differences. The longitudinal benchmarking method results in suppression of noise and revelation of systematic differences. The pattern of a school's cumulative deviations per year group gives a credible reflection of the relative performance of year groups.
CONCLUSIONS Even with highly comparable curricula, single-point benchmarking can result in distortion of the results of comparisons. If longitudinal data are available, the information contained in a school's cumulative deviations from the overall mean can be used. In such a case, the mean test score across schools is a useful benchmark for cross-institutional comparison.
- multicentre study [publication type]
- educational, medical, undergraduate
- educational measurement
- programme evaluation
- inter-institutional relations, schools, medical