Data-driven assessment, contextualisation and implementation of 134 variables in the risk for type 2 diabetes: an analysis of Lifelines, a prospective cohort study in the Netherlands

Thomas P. van der Meer, Bruce H. R. Wolffenbuttel, Chirag J. Patel*

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

1 Citation (Scopus)
18 Downloads (Pure)


Aims/hypothesis We aimed to assess and contextualise 134 potential risk variables for the development of type 2 diabetes and to determine their applicability in risk prediction. Methods A total of 96,534 people without baseline diabetes (372,007 person-years) from the Dutch Lifelines cohort were included. We used a risk variable-wide association study (RV-WAS) design to independently screen and replicate risk variables for 5-year incidence of type 2 diabetes. For identified variables, we contextualised HRs, calculated correlations and assessed their robustness and unique contribution in different clinical contexts using bootstrapped and cross-validated lasso regression models. We evaluated the change in risk, or 'HR trajectory', when sequentially assigning variables to a model. Results We identified 63 risk variables, with novel associations for quality-of-life indicators and non-cardiovascular medications (i.e., proton-pump inhibitors, anti-asthmatics). For continuous variables, the increase of 1 SD of HbA(1c), i.e., 3.39 mmol/mol (0.31%), was equivalent in risk to an increase of 0.53 mmol/l of glucose, 19.8 cm of waist circumference, 8.34 kg/m(2) of BMI, 0.67 mmol/l of HDL-cholesterol, and 0.14 mmol/l of uric acid. Other variables required an increase of >3 SD, which is not physiologically realistic or a rare occurrence in the population. Though moderately correlated, the inclusion of four variables satiated prediction models. Invasive variables, except for glucose and HbA(1c), contributed little compared with non-invasive variables. Glucose, HbA(1c) and family history of diabetes explained a unique part of disease risk. Adding risk variables to a satiated model can impact the HRs of variables already in the model. Conclusions Many variables show weak or inconsistent associations with the development of type 2 diabetes, and only a handful can reliably explain disease risk. Newly discovered risk variables will yield little over established factors, and existing prediction models can be simplified. A systematic, data-driven approach to identify risk variables for the prediction of type 2 diabetes is necessary for the practice of precision medicine.

Original languageEnglish
Pages (from-to)1268-1278
Number of pages11
Issue number6
Publication statusPublished - Jun-2021


  • Contextualisation
  • Data-driven
  • Identification
  • Lasso regression
  • Machine learning
  • Prediction models
  • Prospective
  • Risk variable-wide association study
  • Type 2 diabetes

Cite this