Genotype harmonizer: automatic strand alignment and format conversion for genotype data integration

Research output: Contribution to journalArticleAcademicpeer-review

35 Citations (Scopus)
351 Downloads (Pure)

Abstract

BACKGROUND: To gain statistical power or to allow fine mapping, researchers typically want to pool data before meta-analyses or genotype imputation. However, the necessary harmonization of genetic datasets is currently error-prone because of many different file formats and lack of clarity about which genomic strand is used as reference.

FINDINGS: Genotype Harmonizer (GH) is a command-line tool to harmonize genetic datasets by automatically solving issues concerning genomic strand and file format. GH solves the unknown strand issue by aligning ambiguous A/T and G/C SNPs to a specified reference, using linkage disequilibrium patterns without prior knowledge of the used strands. GH supports many common GWAS/NGS genotype formats including PLINK, binary PLINK, VCF, SHAPEIT2 & Oxford GEN. GH is implemented in Java and a large part of the functionality can also be used as Java 'Genotype-IO' API. All software is open source under license LGPLv3 and available from http://www.molgenis.org/systemsgenetics.

CONCLUSIONS: GH can be used to harmonize genetic datasets across different file formats and can be easily integrated as a step in routine meta-analysis and imputation pipelines.

Original languageEnglish
Article number901
Number of pages4
JournalBMC Research Notes
Volume7
DOIs
Publication statusPublished - 11-Dec-2014

Cite this