Performance optimization in DNA short-read alignment
Author:
Wilton Richard1ORCID,
Szalay Alexander S12
Affiliation:
1. Department of Physics and Astronomy, Johns Hopkins University , Baltimore, MD 21218, USA
2. Department of Computer Science, Johns Hopkins University , Baltimore, MD 21218, USA
Abstract
Abstract
Summary
Over the past decade, short-read sequence alignment has become a mature technology. Optimized algorithms, careful software engineering and high-speed hardware have contributed to greatly increased throughput and accuracy. With these improvements, many opportunities for performance optimization have emerged. In this review, we examine three general-purpose short-read alignment tools—BWA-MEM, Bowtie 2 and Arioc—with a focus on performance optimization. We analyze the performance-related behavior of the algorithms and heuristics each tool implements, with the goal of arriving at practical methods of improving processing speed and accuracy. We indicate where an aligner's default behavior may result in suboptimal performance, explore the effects of computational constraints such as end-to-end mapping and alignment scoring threshold, and discuss sources of imprecision in the computation of alignment scores and mapping quality. With this perspective, we describe an approach to tuning short-read aligner performance to meet specific data-analysis and throughput requirements while avoiding potential inaccuracies in subsequent analysis of alignment results. Finally, we illustrate how this approach avoids easily overlooked pitfalls and leads to verifiable improvements in alignment speed and accuracy.
Contact
richard.wilton@jhu.edu
Supplementary information
Appendices referenced in this article are available at Bioinformatics online.
Funder
National Institutes of Health
Johns Hopkins Department of Physics and Astronomy
Lieber Institute for Brain Development
Extreme Science and Engineering Discovery Environment
UCSD Expanse and Purdue Anvil, XSEDE
National Science Foundation
Publisher
Oxford University Press (OUP)
Subject
Computational Mathematics,Computational Theory and Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Statistics and Probability
Reference36 articles.
1. Trimmomatic: a flexible trimmer for Illumina sequence data;Bolger;Bioinformatics,2014
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献