Idéfix: identifying accidental sample mix-ups in biobanks using polygenic scores

Lifelines Cohort Study, Robert Warmerdam, Pauline Lanting, Patrick Deelen, Lude Franke*

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

17 Downloads (Pure)

Abstract

MOTIVATION: Identifying sample mix-ups in biobanks is essential to allow the repurposing of genetic data for clinical pharmacogenetics. Pharmacogenetic advice based on the genetic information of another individual is potentially harmful. Existing methods for identifying mix-ups are limited to datasets in which additional omics data (e.g., gene expression) is available. Cohorts lacking such data can only use sex, which can reveal only half of the mix-ups. Here, we describe Idéfix, a method for the identification of accidental sample mix-ups in biobanks using polygenic scores.

RESULTS: In the Lifelines population-based biobank we calculated polygenic scores (PGSs) for 25 traits for 32,786 participants. Idéfix then compares the actual phenotypes to PGSs and uses the relative discordance that is expected for mix-ups, compared to correct samples. In a simulation, using induced mix-ups, Idéfix reaches an AUC of 0.90 using 25 polygenic scores and sex. This is a substantial improvement over using only sex, which has an AUC of 0.75. Subsequent simulations present Idéfix's potential in varying datasets with more powerful PGSs. This suggests its performance will likely improve, when more highly powered GWASs for commonly measured traits will become available. Idéfix can be used to identify a set of high-quality participants for whom it is very unlikely that they reflect sample mix-ups, and for these participants we can use genetic data for clinical purposes, such as pharmacogenetic profiles. For instance, in Lifelines we can select 34.4% of participants, reducing the sample mix-up rate from 0.15% to 0.01%.

AVAILABILITY: Idéfix is freely available at https://github.com/molgenis/systemsgenetics/wiki/Idefix.

SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Original languageEnglish
Pages (from-to)1059–1066
Number of pages8
JournalBioinformatics (Oxford, England)
Volume38
Issue number4
Early online date18-Nov-2021
DOIs
Publication statusPublished - 15-Feb-2022

Cite this