Abstract
BACKGROUND: Long reads provide valuable information regarding the sequence composition of genomes. Long reads are usually very noisy which renders their alignments on the reference genome a daunting task. It may take days to process datasets enough to sequence a human genome on a single node. Hence, it is of primary importance to have an aligner which can operate on distributed clusters of computers with high performance in accuracy and speed.
RESULTS: In this paper, we presented IMOS, an aligner for mapping noisy long reads to the reference genome. It can be used on a single node as well as on distributed nodes. In its single-node mode, IMOS is an Improved version of Meta-aligner (IM) enhancing both its accuracy and speed. IM is up to 6x faster than the original Meta-aligner. It is also implemented to run IM and Minimap2 on Apache Spark for deploying on a cluster of nodes. Moreover, multi-node IMOS is faster than SparkBWA while executing both IM (1.5x) and Minimap2 (25x).
CONCLUSION: In this paper, we purposed an architecture for mapping long reads to a reference. Due to its implementation, IMOS speed can increase almost linearly with respect to the number of nodes in a cluster. Also, it is a multi-platform application able to operate on Linux, Windows, and macOS.
Original language | English |
---|---|
Article number | 51 |
Number of pages | 14 |
Journal | Bmc Bioinformatics |
Volume | 20 |
Issue number | 1 |
DOIs | |
Publication status | Published - 24-Jan-2019 |
Externally published | Yes |
Keywords
- Algorithms
- Chromosome Mapping
- Computational Biology
- Databases, Factual
- Databases, Genetic
- Genome, Human
- Genomics
- Humans
- Sequence Alignment
- Sequence Analysis, DNA
- Software
- Workflow