Developing Effective Questionnaire-Based Prediction Models for Type 2 Diabetes for Several Ethnicities

Michail Kokkorakis*, Pytrik Folkertsma, Sipko van Dam, Nicole Sirotin, Shahrad Taheri, Odette Chagoury, Youssef Idaghdour, Robert H. Henning, Jose Castela Forte, Christos S. Mantzoros, Dylan H. de Vries, Bruce H.R. Wolffenbuttel

*Corresponding author for this work

Research output: Working paperPreprintAcademic

27 Downloads (Pure)


Background: Type 2 diabetes disproportionately affects individuals of non-white ethnicity through a complex interaction of multiple factors. Early disease prediction and detection is therefore essential and requires tools that can be deployed at large scale. We aimed to tackle this problem by developing questionnaire-based prediction models for type 2 diabetes for multiple ethnicities.

Methods: Logistic regression models, using questionnaire-only features, were trained on the White population of the UK Biobank, and validated in five other ethnicities and externally in Lifelines. In total, 631,748 individuals were included for prevalence prediction and 67,083 individuals for the eight-year incidence prediction. Predictive accuracy was assessed and a detailed sensitivity analysis was conducted to assess potential clinical utility. Furthermore, we compared the questionnaire algorithms to clinical non-laboratory type 2 diabetes risk tools.

Findings: Our algorithms accurately predicted type 2 diabetes prevalence (AUC=0·901) and eight-year incidence (AUC=0·873) in the White UK Biobank population. Both models replicate well in Lifelines, with AUCs of 0·917 and 0·817 for prevalence and incidence. Both models performed consistently well across ethnicities, with AUCs of 0·855 to 0·894 for prevalence and from 0·819 to 0·883 for incidence. These models generally outperformed two clinically validated non-laboratory tools and correctly reclassified >3,000 type 2 diabetes cases. Model performance improved with the addition of blood biomarkers, but not with the addition of physical measurements.

Interpretation: Easy-to-implement, questionnaire-based models can predict prevalent and incident type 2 diabetes with high accuracy across all ethnicities, providing a highly-scalable solution for population-wide risk stratification.
Original languageEnglish
Publication statusPublished - 14-Jun-2023

Cite this